from:"JCodina"

Re: Store input text after analyzers and token filters

2010-03-15 Thread JCodina


For solr 1.4
Is basically the same but  IndexSchema (org.apache.solr.schema.IndexSchema)
needs to be updated to include the function 
getFieldTypeByName(String fieldTypeName) which is already in sorl1.5

  /**
   * Given the name of a {...@link org.apache.solr.schema.FieldType} (not to be
confused with {...@link #getFieldType(String)} which
   * takes in the name of a field), return the {...@link
org.apache.solr.schema.FieldType}.
   * @param fieldTypeName The name of the {...@link
org.apache.solr.schema.FieldType}
   * @return The {...@link org.apache.solr.schema.FieldType} or null.
   */
  public FieldType getFieldTypeByName(String fieldTypeName){
return fieldTypes.get(fieldTypeName);
  }

then the AnalyzedField is a bit different, but basicaclly is a copy of the
TextField as is in 1.4

http://old.nabble.com/file/p27902273/AnalyzedField.java AnalyzedField.java 
-- 
View this message in context: 
http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27902273.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Store input text after analyzers and token filters

2010-03-09 Thread JCodina


Otis,
I've been thinking on it, and trying to figure out the different solutions
- Try to solve it doing a bridge between  solr and clustering.
- Try to solve it before/during indexing

The second option, of course is better for performance, but how to do it??

I think a good option may be to create a new type derived type from the
FieldType class
like the  SortableIntField which has the toInternal(String val) function.
Then the problem is how to include the result of the analysis of anoter
field type in the  toInternal function

So there would be a new type that can be used on copy fields , that takes
the analysis of the source
field and injects in the code. It takes as parameter the field from which
takes the analysis .

So, how can I get the result of the analysis of a given text by a given
field using internal functions??





Otis Gospodnetic wrote:
 
 Hi Joan,
 
 You could use the FieldAnalysisRequestHandler:
 http://www.search-lucene.com/?q=FieldAnalysisRequestHandler
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/
 
 
-- 
View this message in context: 
http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27840488.html
Sent from the Solr - User mailing list archive at Nabble.com.

Store input text after analyzers and token filters

2010-03-05 Thread JCodina



In an stored field, the content stored is the raw input text.
But when the analyzers perform some cleaning or interesting transformation
of the text, then it could be interesting to store the text after the
tokenizer/Filter chain
there is a way to do this? To be able to get back the text of the document
after being processed??

thanks
Joan
-- 
View this message in context: 
http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27792550.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Store input text after analyzers and token filters

2010-03-05 Thread JCodina


Thanks,
It can be useful as a workarrond, 
but I get a vector not a result that I may use wherever I could used the
stored text. 
I'm thinking in clustering.


Ahmet Arslan wrote:
 
 In an stored field, the content stored is the raw input
 text.
 But when the analyzers perform some cleaning or interesting
 transformation
 of the text, then it could be interesting to store the text
 after the
 tokenizer/Filter chain
 there is a way to do this? To be able to get back the text
 of the document
 after being processed??
 
 You can get term vectors [1] of analyzed text.
 
 Also you can see analyzed text in solr/admin/analysis.jsp if you copy and
 paste sample text data.
 
 [1] http://wiki.apache.org/solr/TermVectorComponent 
 
 
   
 
 

-- 
View this message in context: 
http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27794689.html
Sent from the Solr - User mailing list archive at Nabble.com.

Clustering from anlayzed text instead of raw input

2010-03-03 Thread JCodina


I'm trying to use  carrot2 (now I started with the workbench) and I can
cluster any field, but, the text used for clustering is the original raw
text, the one that was indexed, without any of the processing performed by
the tokenizer or filters. 
So I get stop words.
 I also did shingles (after filtering by POS) and I can not cluster using
these multiwords. 
So my question is about how to get in a query answer the indexed text
instead of the original one, because if I set stored to false, then the
search does not return the content of the field.

Tahnks in advance

Joan
-- 
View this message in context: 
http://old.nabble.com/Clustering-from-anlayzed-text-instead-of-raw-input-tp27765780p27765780.html
Sent from the Solr - User mailing list archive at Nabble.com.

error in sum function

2010-03-03 Thread JCodina


the sum function or the map one are not parsed correctly,
doing this sort, works as a charm...
sort=score+desc,sum(Num,map(Num,0,2000,42000))+asc
but

sort=score+desc,sum(map(Num,0,2000,42000),Num)+asc

gives the following exception

SEVERE: org.apache.solr.common.SolrException: Must declare sort field or
function
at
org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376)
at
org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281)
at org.apache.solr.search.QParser.getSort(QParser.java:217)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:86)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

you can test it in here using these two url's

http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onfl=id,Num,scoresort=score%20asc,sum%28map%28Num,0,5000,42000%29,Num%29+ascq=+entities_org:%28%22Amena%22%29

http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onwt=phpfl=id,Num,scorerows=50sort=score+desc,sum%28Num,map%28Num,0,2000,42000%29%29+ascq=+entities_org:Amena




-- 
View this message in context: 
http://old.nabble.com/error-in-sum-function-tp27765881p27765881.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: error in sum function

2010-03-03 Thread JCodina



Ok, solved!!!

Joan

Koji Sekiguchi-2 wrote:
 
 Can you try it latest trunk? I have just fixed it in a couple of days
 
 Koji Sekiguchi from mobile
 
 
 On 2010/03/03, at 18:18, JCodina joan.cod...@barcelonamedia.org wrote:
 

 the sum function or the map one are not parsed correctly,
 doing this sort, works as a charm...
 sort=score+desc,sum(Num,map(Num,0,2000,42000))+asc
 but

 sort=score+desc,sum(map(Num,0,2000,42000),Num)+asc

 gives the following exception

 SEVERE: org.apache.solr.common.SolrException: Must declare sort  
 field or
 function
at
 org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376)
at
 org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281)
at org.apache.solr.search.QParser.getSort(QParser.java:217)
at
 org.apache.solr.handler.component.QueryComponent.prepare 
 (QueryComponent.java:86)
at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody 
 (SearchHandler.java:174)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest 
 (RequestHandlerBase.java:131)

 you can test it in here using these two url's

 http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onfl=id,Num,scoresort=score%20asc,sum%28map%28Num,0,5000,42000%29,Num%29+ascq=+entities_org:%28%22Amena%22%29

 http://varoitus.barcelonamedia.org:8180/fbm-ijec/select?indent=onwt=phpfl=id,Num,scorerows=50sort=score+desc,sum%28Num,map%28Num,0,2000,42000%29%29+ascq=+entities_org:Amena




 -- 
 View this message in context:
 http://old.nabble.com/error-in-sum-function-tp27765881p27765881.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 

-- 
View this message in context: 
http://old.nabble.com/error-in-sum-function-tp27765881p27768877.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Clustering from anlayzed text instead of raw input

2010-03-03 Thread JCodina


Thanks Staszek
 I'll give a try to stopwords treatbment, but the problem is that we perform
POS tagging and then use payloads to keep only Nouns and Adjectives, and we
thought that could be interesting to perform clustering only with these
elements, to avoid senseless words.

Of course is a problem of clustering, but maybe is also a feature that could
be interesting to have in solr: not to index the raw input text but the
analyzed one, so stored could be False | Raw | analyzed


Stanislaw Osinski-2 wrote:
 
 Hi Joan,
 
 I'm trying to use  carrot2 (now I started with the workbench) and I can
 cluster any field, but, the text used for clustering is the original raw
 text, the one that was indexed, without any of the processing performed
 by
 the tokenizer or filters.
 So I get stop words.

 
 The easiest way to fix this is to update the stop words list used by
 Carrot2, see http://wiki.apache.org/solr/ClusteringComponent, Tuning
 Carrot2 clustering section at the bottom.
 
  If you want to get readable
 cluster labels, it's best to feed the raw text for clustering (cluster
 labels are phrases taken from the input text, if you remove stopwords and
 stem everything, the phrases will become unreadable).
 
 Cheers,
 
 Staszek
 
 

-- 
View this message in context: 
http://old.nabble.com/Clustering-from-anlayzed-text-instead-of-raw-input-tp27765780p27769034.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr and UIMA

2010-03-02 Thread JCodina


You can test our UIMA to Solr cas consumer
is based on JulieLab Lucas and uses their CAS.
but transformed to generate XML which can be saved to a file or posted
direcly to solr
In the map file you can define which information is generated for each
token, and how its concatenaded, allowing the generation of thinks like
the|AD car|NC  which then can be processed using payloads.

now you can get it from my page
http://www.barcelonamedia.org/personal/joan.codina/en
http://www.barcelonamedia.org/personal/joan.codina/en 


-- 
View this message in context: 
http://old.nabble.com/Solr-and-UIMA-tp24567504p27753399.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr and UIMA

2010-02-11 Thread JCodina


Things are done  :-)

now we already have done the UIMA CAS consumer for Solr, 
we are making it public, more news soon.

We have also been developing some filters based on payloads 
One of the filters is to remove words with the payloads in the list the
other one  maintains only these tokens with paylodas in the list.  It works
the same way than the stopsFilterFactory

you can find it at my page:
http://www.barcelonamedia.org/personal/joan.codina/en
http://www.barcelonamedia.org/personal/joan.codina/en 

-- 
View this message in context: 
http://old.nabble.com/Solr-and-UIMA-tp24567504p27544646.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr and UIMA

2009-07-24 Thread JCodina





On Jul 21, 2009, at 11:57 AM, JCodina wrote:

Let me sintetize:

We (well, I think Grant?) do changes in the DPTFF (
DelimitedPayloadTokenFilterFactory ) so that is able to index at the same
position different tokes that may have payloads.
1. token delimiter (#)
2. payload delimiter (|) 

We (that's me) perform a SolCAS: a UIMA CAS consumer equivalent to LuCAS but
that allows indexing using Solr. This SolCAS is able to manage generate
different tokens at the same position and maybe payloads, the result is
ready for the new  DPTFF

We (me again) develop some filtering utilities based on the payload that,
something like the stopwords 
filter but instead of rejecting those tokens that are in the stopwords list
would reject those  that are in the payloads list.

We will try also to develop an n-gram generator based on the payloads, like
for example find the nouns followed by an adjective that are at less than 4
positions. 

For the moment searches can not be performed based on payloads, not even as
a filter... but this is a matter of time.

Problems to solve:
Perform a nice processing of the N tokens that share the same position, as
the tokenizer.Next() will not give them together (which is a pitty) .Write
some utility tht would allow the tools that manage multitokens to have a
similar front-end and back-end that does multiple Nexts in order to put
toguether all the information at the same position, performs the treatment
with a multitoken structure and then generates a multitoken that is sent to
the backend that has the next again on single tokens...

Joan
-- 
View this message in context: 
http://www.nabble.com/Solr-and-UIMA-tp24567504p24639814.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Lemmatisation support in Solr

2009-07-21 Thread JCodina


I think that to get the best results you need some kind of natural language
processing 
I'm trying to do so using UIMA but i need to integrate it with SOLR as I
explain in this post
http://www.nabble.com/Solr-and-UIMA-tc24567504.html


prerna07 wrote:
 
 Hi,
 
 I am implementing Lemmatisation in Solr, which means if user looks for
 Mouse then it should display results of Mouse and Mice both. I 
 understand that this is something context search. I think of using synonym
 for 
 this but then synonyms.txt will be having so many records and this will
 keep on
 adding.
 
 Please suggest how I can implement it in some other way.
 
 Thanks,
 Prerna
 

-- 
View this message in context: 
http://www.nabble.com/Lemmatisation-support-in-Solr-tp24583655p24583841.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr and UIMA

2009-07-21 Thread JCodina


Hello, Grant,
there are two ways, to implement this, one is payloads, and the other one is
multiple tokens at the same positions.
Each of them can be useful, let me explain the way I thick they can be used.
Payloads : every token has extra information that can be used in the
processing , for example if I can add Part-of-speech then I can develop
tokenizers that take into account the POS (or for example I can generate
bigrams of Noum Adjective, or Noum prep Noum i can have a better stopwords
algorithm)

Multiple tokes in one position: If I can have  different tokens at the same
place, I can have different informations like: was #verb _be so I can do a
search for you _be #adjective to find all the sentences that talk about
you for example you were clever you are tall ..


I have not understood the way that theDelimitedPayloadTokenFilterFactory
may work in solr, which is the input format? 

so I was thinking in generating an xml where for each token a single string
is generated like was#verb#be
and then there is a tokenfilter that splits by # each white space separated
string,  in this case  in three words and adds the trailing character that
allows to search for the right semantic info. But gives them the same
increment. Of course the full processing chain must be aware of this.
But I must think on multiwords tokens  


Grant Ingersoll-6 wrote:
 
 
 On Jul 20, 2009, at 6:43 AM, JCodina wrote:
 
 D: Break things down. The CAS would only produce XML that solr can  
 process.
 Then different Tokenizers can be used to deal with the data in the  
 CAS. the
 main point is that the XML has  the doc and field labels of solr.
 
 I just committed the DelimitedPayloadTokenFilterFactory, I suspect  
 this is along the lines of what you are thinking, but I haven't done  
 all that much with UIMA.
 
 I also suspect the Tee/Sink capabilities of Lucene could be helpful,  
 but they aren't available in Solr yet.
 
 
 
 
 E: The set of capabilities to process the xml is defined in XML,  
 similar to
 lucas to define the ouput and in the solr schema to define how this is
 processed.


 I want to use it in order to index something that is common but I  
 can't get
 any tool to do that with sol: indexing a word and coding at the same
 position the syntactic and semantic information. I know that in  
 Lucene this
 is evolving and it will be possible to include metadata but for the  
 moment
 
 What does Lucas do with Lucene?  Is it putting multiple tokens at the  
 same position or using Payloads?
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-and-UIMA-tp24567504p24590509.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr and UIMA

2009-07-20 Thread JCodina


We are starting to use UIMA as a platform to analyze the text.
The result of analyzing a document is a UIMA CAS. A Cas is a generic  data
structure that can contain different data. 
UIMA processes single documents, They get the documents from a CAS producer,
process them using a PIPE that the user defines  and finally sends the
result to a CAS consumer, that saves or stores the result.
The pipe is then a connection of different tools that annotate the text with
different information. Different sets of tools are available out there, each
of them deffining it's own data type's  that are included in the CAS. To
perform a PIPE output and input CAS of the elements to connect need to be
compatible

There is CAS consumer that feeds a LUCENE index, it's called LUCAS but I was
looking to it, and I prefer to use UIMA connected to SOLR, why?
A: I know solr ;-) and i like it 
B: I can configure  the fields  and their processing in solr using xml. Once
done then I have it ready to use with a set of tools that allow me to easily
explore the data  
C: Is easier to use SOLR as a web service that may receive docs from
different UIMA's (Natural Language processing is CPU intensive )
D: Break things down. The CAS would only produce XML that solr can process.
Then different Tokenizers can be used to deal with the data in the CAS. the
main point is that the XML has a the doc and field labels of solr.
E: The set of capabilities to process the xml is defined in XML, similar to
lucas to define the ouput and in the solr schema to define how this is
processed.


I want to use it in order to index something that is common but I can't get
any tool to do that with sol: indexing a word and coding at the same
position the syntactic and semantic information. I know that in Lucene this
is evolving and it will be possible to include metadata but for the moment


So, my idea is first to produce a UIMA CAS consumer that performs the POST
of an XML file containing the plain text text of the document  to SOLR; then
try to modify this in order to include multiple fields and start coding the
semantic information.

So, before starting, i would like to know your opinions and if anyone is
interested to collaborate, or has some code that can be integrated into
this.
 
-- 
View this message in context: 
http://www.nabble.com/Solr-and-UIMA-tp24567504p24567504.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: facets and stopwords

2009-07-08 Thread JCodina




hossman wrote:
 
 
 but are you sure that example would actually cause a problem?
 i suspect if you index thta exact sentence as is you wouldn't see the 
 facet count for si or que increase at all.
 
 If you do a query for {!raw field=content}que you bypass the query 
 parsers (which is respecting your stopwords file) and see all docs that 
 contain the raw term que in the content field.
 
 if you look at some of the docs that match, and paste their content field 
 into the analysis tool, i think you'll see that the problem comes from 
 using the whitespace tokenizer, and is masked by using the WDF 
 after the stop filter ... things like Que? are getting ignored by the 
 stopfilter, but ultimately winding up in your index as que
 
 
 -Hoss
 
 

Yes your are right, que? que, que... i need to change the analyzer. They are
not detected by the stopwords analyzer because i use the whitespace
tokenizer, I will use the StanadardTokenizer

Thanks Hoss

-- 
View this message in context: 
http://www.nabble.com/facets-and-stopwords-tp23952823p24390157.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: facets and stopwords

2009-07-01 Thread JCodina

Sorry , I was too cryptic.

I you follow this link

http://projecte01.development.barcelonamedia.org/fonetic/
you will see a Top Words list (in Spanish and stemmed) in the list there
is the word si which is in 20649 documents.
If you click at this word, the system will perform the query
(x) content:si, with no answers at all
The same for la it is in 17881 documents, but the query content:la will
give no answers at all

the facets list is generated by the query
http://projecte01.development.barcelonamedia.org/solr/select/?rows=0start=0q=*:*facet=truefacet.limit=-1facet.field=contentfacet.field=entities_miscwt=jsonjson.wrf=jsonp1246437157825jsoncallback=jsonp1246437157825_=1246437158023

but the question is why these two words (among others) are there if they are
stop words?

To see what's going on on the index I have tested with the analyzer
http://projecte01.development.barcelonamedia.org/solr/admin/analysis.jsp

If I select the field content and I write the text

las cosas que si no pasan la proxima vez si que no veràs

i get the following tokens at the end of the analyzer

las cosapasan proxima vez sí verà

where que, si, no, la are removed as treated as stop words.

but... in the schema browser
http://projecte01.development.barcelonamedia.org/solr/admin/schema.jsp
in the field content que is the 3rd word no the 4th si and la are
between the top 40 terms...

the analyzer for the content can be seen in this page and has the following
analyzers

Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory

Filters:

1. org.apache.solr.analysis.StopFilterFactory
args:{enablePositionIncrements: true words: stopwords.txt ignoreCase: true }
2. org.apache.solr.analysis.WordDelimiterFilterFactory
args:{catenateWords: 1 catenateNumbers: 1 splitOnCaseChange: 1 catenateAll:
0 generateNumberParts: 1 generateWordParts: 1 }
3. org.apache.solr.analysis.LowerCaseFilterFactory args:{}
4. org.apache.solr.analysis.SnowballPorterFilterFactory args:{languange:
Spanish }
5. org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}

The field is indexed, tokenized, stored and termvectors are stored.

So, why the stopwords are in the index?

--
View this message in context:
http://www.nabble.com/facets-and-stopwords-tp23952823p24286283.html
Sent from the Solr - User mailing list archive at Nabble.com.

Top tf_idf in TermVectorComponent

2009-06-25 Thread JCodina


In order to perform any further study of the resultset, like clustering, the
TermVectorComponent
gives the list of words with the correspoing tf, idf, 
but this list can be huge for each document, and most of the terms may have
a low tf or a too high df, 
maybe, it is usefull to compare the relative increment of DF to the
collection in order to improve the facets (show only these terms that the
relative DF in the query is higher than in the full  collection)

To perform this it could be interesting that the TermVectorComponent could
sort the results by  some of these options:
*tf
*DF
* tf/df (to simplify) or tf*idf where idf is computed as log(total_docs/df)
and truncate the list to a number of words or a given value 
 
or maybe there is another way to perform this?
Joan
-- 
View this message in context: 
http://www.nabble.com/Top-tf_idf-in-TermVectorComponent-tp24201076p24201076.html
Sent from the Solr - User mailing list archive at Nabble.com.

version of lucene

2009-06-15 Thread JCodina


I have the solr-nightly build of last week, and in the lib foloder i can find
the lucene-core-2.9-dev.jar
I need to do some changes to the shingle filter in order to remove stopwords
from bigrams, but to do so I need to compile lucene, 
the problem is, lucene is in version 2.4 not 2.9
If I take, with subverison, version 2.4 then compiling solr I get the next
error:
.../apache-solr-nightly/src/java/org/apache/solr/search/DocSetHitCollector.java:21:
cannot find symbol
[javac] symbol  : class Collector
[javac] location: package org.apache.lucene.search
[javac] import org.apache.lucene.search.Collector;


any hints on the right version of lucene/solr  to be able to use solr 1.4

Joan
-- 
View this message in context: 
http://www.nabble.com/version-of-lucene-tp24036137p24036137.html
Sent from the Solr - User mailing list archive at Nabble.com.

facets and stopwords

2009-06-09 Thread JCodina


I have a text field from where I remove stop words, as a first approximation
I use facets to see the most common words in the text, but.. stopwords are
there, and if I search documents having the stopwords, then , there are no
documents in the answer. 
You can test it in this address (using solrjs, the texts are in spanish but
you can check in top words that que or en are there) but if you click on
them to perform the search no results  are given
http://projecte01.development.barcelonamedia.org/fonetic/
or the administrator at
http://projecte01.development.barcelonamedia.org/solr/admin
so you can check wat's going on on the content field.
I use the DataImportHandler to import the data, and
Solr analyzer shows me how  the stopwords are removed from both the query
and the indexed text, but why facets show me these words? 

-- 
View this message in context: 
http://www.nabble.com/facets-and-stopwords-tp23952823p23952823.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Build Solr to run SolrJS

2008-11-22 Thread JCodina


Yesterday I got it running, I thouhght I had post it, but I should not push
the post button.
The problem I had to solve to run it with the Velocity was to copy by hand
the 
velocity jar files to /lib.



Erik Hatcher wrote:
 
 I just got the client-side demo on trunk to work (with a few tweaks to  
 make it work with the example core Solr data).
 
 On trunk follow these steps:
 
* root directory: ant example
* separate console, index data: cd example/exampledocs; java -jar  
 post.jar *.xml
* open contrib/javascript/example/testClientside.html in your browser
 
 The serverSide.html example is still not quite wired together properly  
 to be an easy-to-run-demo, as what's on trunk doesn't include the  
 reuters data and the velocity stuff is not wired in with it yet either.
 
 We'll get this working better/cleaner as we go, so we appreciate your  
 early adopter help ironing out this stuff.
 
   Erik
 
 On Nov 20, 2008, at 5:44 PM, JCodina wrote:
 

 I could not manage, yet to use it. :confused:
 My doubts are:
 - must I  download solr from svn - trunk?
 - then, must I apply the patches of solrjs and velocity and unzip  
 the files?
 or is this  already in trunk?
 because  trunk contains velocity and javascript in contrib.
   but does not find the velocity
 - How do I edit/activate SolrJs to adapt it to my data, the wiki  
 page says
 how to deploy the sample, and I looked at the sample page from the  
 sample
 site, but I don't find how to manually install it on a tomcat server.
 PD. If I get how to do it, I promise I will introduce that  
 information in
 the solrjs wiki page  =) .


 Matthias Epheser wrote:

 Erik Hatcher schrieb:

 On Nov 16, 2008, at 1:40 PM, Matthias Epheser wrote:
 Matthias and Ryan - let's get SolrJS integrated into
 contrib/velocity.  Any objections/reservations?

 As SolrJS may be used without velocity at all (using eg.
 ClientSideWidgets), is it possible to put it into contrib/ 
 javascript
 and create a dependency to contrib/velocity for  
 ServerSideWidgets?

 Sure, contrib/javascript sounds perfect.

 If that's ok, I'll have a look at the directory structure and the
 current ant build.xml to make them fit into the common solr  
 structure
 and build.

 Awesome, thanks!

 Just uploaded solrjs.zip to
 https://issues.apache.org/jira/browse/SOLR-868. It
 is intended to be extracted in contrib/javascript and supports the
 following ant
 targets:

 * ant dist - creates a single js file and a jar that holds velocity
 templates.
 * ant docs - creates js docs. test in browser: doc/index.html
 * ant example-init - (depends ant dist on solr root) copies the  
 current
 built
 of solr.war and solr-velocity.jar to example/testsolr/..
 * ant example-start - starts the testsolr server on port 8983
 * ant example-import - imports 3000 test data rows (requires a  
 started
 testserver)



Erik





 -- 
 View this message in context:
 http://www.nabble.com/Build-Solr-to-run-SolrJS-tp20526644p20611635.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 



-- 
View this message in context: 
http://www.nabble.com/Build-Solr-to-run-SolrJS-tp20526644p20634725.html
Sent from the Solr - User mailing list archive at Nabble.com.

DataImportHanler JDBC case problems

2008-11-21 Thread JCodina


I tried to perform a DataImportHandler where the column name user and the
field name  User are  the same  but the case of the first letter,
when performing a full import, I was getting different sorts of errors, on
that field depending on the cases of the names, I tried the four possible
combinations,  and none worked. 
At the end, I changed column name in the database to be User and everthing
was fine.
  
dataConfig
document 
entity name=item query=select * from DeNDocs
field column=ID name=id /
field column=User name=User /


-- 
View this message in context: 
http://www.nabble.com/DataImportHanler-JDBC-case-problems-tp20617216p20617216.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Build Solr to run SolrJS

2008-11-20 Thread JCodina


I could not manage, yet to use it. :confused:
My doubts are:
- must I  download solr from svn - trunk?
- then, must I apply the patches of solrjs and velocity and unzip the files?
or is this  already in trunk?
 because  trunk contains velocity and javascript in contrib.
   but does not find the velocity
- How do I edit/activate SolrJs to adapt it to my data, the wiki page says
how to deploy the sample, and I looked at the sample page from the sample
site, but I don't find how to manually install it on a tomcat server. 
PD. If I get how to do it, I promise I will introduce that information in
the solrjs wiki page  =) .


Matthias Epheser wrote:
 
 Erik Hatcher schrieb:
 
 On Nov 16, 2008, at 1:40 PM, Matthias Epheser wrote:
 Matthias and Ryan - let's get SolrJS integrated into 
 contrib/velocity.  Any objections/reservations?

 As SolrJS may be used without velocity at all (using eg. 
 ClientSideWidgets), is it possible to put it into contrib/javascript 
 and create a dependency to contrib/velocity for ServerSideWidgets?
 
 Sure, contrib/javascript sounds perfect.
 
 If that's ok, I'll have a look at the directory structure and the 
 current ant build.xml to make them fit into the common solr structure 
 and build.
 
 Awesome, thanks!
 
 Just uploaded solrjs.zip to
 https://issues.apache.org/jira/browse/SOLR-868. It 
 is intended to be extracted in contrib/javascript and supports the
 following ant 
 targets:
 
 * ant dist - creates a single js file and a jar that holds velocity
 templates.
 * ant docs - creates js docs. test in browser: doc/index.html
 * ant example-init - (depends ant dist on solr root) copies the current
 built 
 of solr.war and solr-velocity.jar to example/testsolr/..
 * ant example-start - starts the testsolr server on port 8983
 * ant example-import - imports 3000 test data rows (requires a started
 testserver)
 
 
 
 Erik
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Build-Solr-to-run-SolrJS-tp20526644p20611635.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Build Solr to run SolrJS

2008-11-17 Thread JCodina



To give you more information.

The error I get is this one:

java.lang.NoClassDefFoundError:
org/apache/solr/request/VelocityResponseWriter (wrong name:
contrib/velocity/src/main/java/org/apache/solr/request/VelocityResponseWriter)
at java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:621) at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) at
org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:1847)
at
org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:890)
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1354)
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at
java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:247) at 


and in the build  , if i do a build-contrib-dist
i get these messages..
...
build:
  [jar] Building jar:
/home/joan/workspace/solr/contrib/dataimporthandler/target/apache-solr-dataimporthandler-1.4-dev.jar
dist:
 [copy] Copying 1 file to
/home/joan/workspace/solr/build/web/WEB-INF/lib
 [copy] Copying 1 file to /home/joan/workspace/solr/dist
init:
init-forrest-entities:
compile-common:
compile:
make-manifest:
compile:
[javac] Compiling 4 source files to
/home/joan/workspace/solr/contrib/velocity/target/classes
build:
  [jar] Building jar:
/home/joan/workspace/solr/contrib/velocity/src/main/solr/lib/apache-solr-velocity-1.4-dev.jar
dist:
...
where the dataimporthanler seems that is copied in the dist folders but
velocity is not.


Hope this saves you some time..

Joan
-- 
View this message in context: 
http://www.nabble.com/Build-Solr-to-run-SolrJS-tp20526644p20535777.html
Sent from the Solr - User mailing list archive at Nabble.com.

Build Solr to run SolrJS

2008-11-16 Thread JCodina


I downloaded solr/trunk and build it,
everything seems to work except that the VelocityResponseWriter is not in
the war file
and tomcat , gives an error of configuration when using the conf.xml of the
solrjs.
Any suggestion on how to build the solr to work with solrjs??


Thanks
Joan Codina 
-- 
View this message in context: 
http://www.nabble.com/Build-Solr-to-run-SolrJS-tp20526644p20526644.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Store input text after analyzers and token filters

Re: Store input text after analyzers and token filters

Store input text after analyzers and token filters

Re: Store input text after analyzers and token filters

Clustering from anlayzed text instead of raw input

error in sum function

Re: error in sum function

Re: Clustering from anlayzed text instead of raw input

Re: Solr and UIMA

Re: Solr and UIMA

Re: Solr and UIMA

Re: Lemmatisation support in Solr

Re: Solr and UIMA

Solr and UIMA

Re: facets and stopwords

Re: facets and stopwords

Top tf_idf in TermVectorComponent

version of lucene

facets and stopwords

Re: Build Solr to run SolrJS

DataImportHanler JDBC case problems

Re: Build Solr to run SolrJS

Re: Build Solr to run SolrJS

Build Solr to run SolrJS

24 matches

Site Navigation

Mail list logo

Footer information