Re: Embedded Solr

2011-03-22 Thread Paul Libbrecht
Greg,

I guess the question is related to the app-store's ban of java.
If you follow the java-...@lists.apple.com mailing list you'll see that a full 
JVM is a problem but that skimmed, self-installed JVMs have been ok at least 
once.
SOLR is under the family of things that do not need a gui so you do a little 
chance to skim-down efficiently into something that doesn't rely on a 
pre-installed jvm.

I believe the results' of such an experiment interest this list (and the list 
above).

paul


Le 22 mars 2011 à 00:53, Bill Bell a écrit :

 Yes it needs java to run
 
 Bill Bell
 Sent from mobile
 
 
 On Mar 21, 2011, at 2:30 PM, Greg Georges greg.geor...@biztree.com wrote:
 
 Hello all,
 
 I am using Solr in a Java architecture right now, and the results are great. 
 The app development team has asked me if it is possible to embed Solr, but 
 the request is to embed it into a C++ app and mac app using objective C. I 
 do not have much knowledge on embedded Solr. Does it need a JVM? Is what 
 they are asking me possible and are there any resources for it? Thanks
 
 Greg



when to change rows param?

2011-03-22 Thread Paul Libbrecht

Hello list,

I've been using my own QueryComponent (that extends the search one) 
successfully to rewrite web-received parameters that are sent from the 
(ExtJS-based) javascript client.
This allows an amount of query-rewriting, that's good.
I tried to change the rows parameter there (which is limit in the query, as 
per the underpinnings of ExtJS) but it seems that this is not enough.

Which component should I subclass to change the rows parameter?

thanks in advance

paul

Re: working with collection : Where is default schema.xml

2011-03-22 Thread Geert-Jan Brits
Changing the default schema.xml to what you want is the way to go for most
of us.
It's a good learning experience as well, since it contains a lot of
documentation about the options that may be of interest to you.

Cheers,
Geert-Jan

2011/3/22 geag34 sac@gmail.com

 Ok thank.

 It is my fault. I have created collection with a lucidimagination perl
 script.

 I will errase the schema.xml.

 Thanks

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/working-with-collection-Where-is-default-schema-xml-tp2700455p2712496.html
 Sent from the Solr - User mailing list archive at Nabble.com.



SOLR-2242-distinctFacet.patch

2011-03-22 Thread Isha Garg

Hi,
  I  want to enquire the patch for 
namedistinct(SOLR-2242-distinctFacet.patch) available with solr4.0 trunk.


Thank!
Isha


Re: Help with explain query syntax

2011-03-22 Thread Glòria Martínez
Thank you very much!

On Wed, Mar 9, 2011 at 2:01 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 It's probably the WordDelimiterFilter:

  org.apache.solr.analysis.WordDelimiterFilterFactory
 args:{preserveOriginal:
  1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0
  generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 }

 Get rid of the preserveOriginal=1 in the query analyzer.

 -Yonik
 http://lucidimagination.com

 On Tue, Mar 1, 2011 at 9:01 AM, Glòria Martínez
 gloria.marti...@careesma.com wrote:
  Hello,
 
  I can't understand why this query is not matching anything. Could someone
  help me please?
 
  *Query*
 
 http://localhost:8894/solr/select?q=linguajob.plqf=company_namewt=xmlqt=dismaxdebugQuery=onexplainOther=id%3A1
 
  response
  -
  lst name=responseHeader
  int name=status0/int
  int name=QTime12/int
  -
  lst name=params
  str name=explainOtherid:1/str
  str name=debugQueryon/str
  str name=qlinguajob.pl/str
  str name=qfcompany_name/str
  str name=wtxml/str
  str name=qtdismax/str
  /lst
  /lst
  result name=response numFound=0 start=0/
  -
  lst name=debug
  str name=rawquerystringlinguajob.pl/str
  str name=querystringlinguajob.pl/str
  -
  str name=parsedquery
  +DisjunctionMaxQuery((company_name:(linguajob.pl linguajob) pl)~0.01)
 ()
  /str
  -
  str name=parsedquery_toString
  +(company_name:(linguajob.pl linguajob) pl)~0.01 ()
  /str
  lst name=explain/
  str name=otherQueryid:1/str
  -
  lst name=explainOther
  -
  str name=1
 
  0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited
  clause(s)
   0.0 = no match on required clause (company_name:(linguajob.pllinguajob)
  pl) *- What does this syntax (field:(token1 token2) token3) mean?*
 0.0 = (NON-MATCH) fieldWeight(company_name:(linguajob.pl linguajob)
 pl
  in 0), product of:
   0.0 = tf(phraseFreq=0.0)
   1.6137056 = idf(company_name:(linguajob.pl linguajob) pl)
   0.4375 = fieldNorm(field=company_name, doc=0)
  /str
  /lst
  str name=QParserDisMaxQParser/str
  null name=altquerystring/
  null name=boostfuncs/
  +
  lst name=timing
  ...
  /response
 
 
 
  There's only one document indexed:
 
  *Document*
  http://localhost:8894/solr/select?q=1qf=idwt=xmlqt=dismax
  response
  -
  lst name=responseHeader
  int name=status0/int
  int name=QTime2/int
  -
  lst name=params
  str name=qfid/str
  str name=wtxml/str
  str name=qtdismax/str
  str name=q1/str
  /lst
  /lst
  -
  result name=response numFound=1 start=0
  -
  doc
  str name=company_nameLinguaJob.pl/str
  str name=id1/str
  int name=status6/int
  date name=timestamp2011-03-01T11:14:24.553Z/date
  /doc
  /result
  /response
 
  *Solr Admin Schema*
  Field: company_name
  Field Type: text
  Properties: Indexed, Tokenized, Stored
  Schema: Indexed, Tokenized, Stored
  Index: Indexed, Tokenized, Stored
 
  Position Increment Gap: 100
 
  Index Analyzer: org.apache.solr.analysis.TokenizerChain Details
  Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory
  Filters:
  schema.UnicodeNormalizationFilterFactory args:{composed: false
  remove_modifiers: true fold: true version: java6 remove_diacritics: true
 }
  org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
  ignoreCase: true enablePositionIncrements: true }
  org.apache.solr.analysis.WordDelimiterFilterFactory
 args:{preserveOriginal:
  1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1
  generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 }
  org.apache.solr.analysis.LowerCaseFilterFactory args:{}
  org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 
  Query Analyzer: org.apache.solr.analysis.TokenizerChain Details
  Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory
  Filters:
  schema.UnicodeNormalizationFilterFactory args:{composed: false
  remove_modifiers: true fold: true version: java6 remove_diacritics: true
 }
  org.apache.solr.analysis.SynonymFilterFactory args:{synonyms:
 synonyms.txt
  expand: true ignoreCase: true }
  org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
  ignoreCase: true }
  org.apache.solr.analysis.WordDelimiterFilterFactory
 args:{preserveOriginal:
  1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0
  generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 }
  org.apache.solr.analysis.LowerCaseFilterFactory args:{}
  org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 
  Docs: 1
  Distinct: 5
  Top 5 terms
  term frequency
  lingua 1
  linguajob.pl 1
  linguajobpl 1
  pl 1
  job 1
 
  *Solr Analysis*
  Field name: company_name
  Field value (Index): LinguaJob.pl
  Field value (Query): linguajob.pl
 
  *Index Analyzer
 
  org.apache.solr.analysis.WhitespaceTokenizerFactory {}
  term position 1
  term text LinguaJob.pl
  term type word
  source start,end 0,12
  payload
 
  schema.UnicodeNormalizationFilterFactory {composed=false,
  remove_modifiers=true, fold=true, version=java6, remove_diacritics=true}
  term 

Re: Transform a SolrDocument into a SolrInputDocument

2011-03-22 Thread Marc SCHNEIDER
Ok that's perfectly clear.
Thanks a lot for all your answers!

Marc.

On Mon, Mar 21, 2011 at 4:34 PM, Gora Mohanty g...@mimirtech.com wrote:

 On Mon, Mar 21, 2011 at 8:33 PM, Marc SCHNEIDER
 marc.schneide...@gmail.com wrote:
  Hi Erick,
 
  Thanks for your answer.
  I'm a quite newbie to Solr so I'm a little bit confused.
  Do you mean that (using Solrj in my case) I should add all fields (stored
  and not stored) before adding the document to the index?
 [...]

 No, what he means is that the Solr output contains *only*
 stored fields. There might be fields that are indexed (i.e.,
 available for search), but not stored in the current index.
 In such a case, these fields will be blank in the Solr output,
 and consequently, in the updated document.

 Regards,
 Gora



solr on the cloud

2011-03-22 Thread Dmitry Kan
hey folks,

I have tried running the sharded solr with zoo keeper on a single machine.
The SOLR code is from current trunk. It runs nicely. Can you please point me
to a page, where I can check the status of the solr on the cloud development
and available features, apart from http://wiki.apache.org/solr/SolrCloud ?

Basically, of high interest is checking out the Map-Reduce for distributed
faceting, is it even possible with the trunk?

-- 
Regards,

Dmitry Kan


Re: DIH Issue(newbie to solr)

2011-03-22 Thread Gora Mohanty
On Mon, Mar 21, 2011 at 10:59 PM, neha pneha...@yahoo.com wrote:
 Thanks Gora it works..!!! Thanks again. One last question, the documents get
 indexed well and all but when I issue full-import command it still says
 Total Requests made to DataSource 0
[...]

Not sure why that is, but would guess that it is something to do with
how a FileDataSource is used. Unfortunately, I do not have the time
to test this right now, but you could check if you get the same zero
requests with the RSS indexing example in example/example-DIH/solr/rss/
in the Solr distribution?

Regards,
Gora


Re: email - DIH

2011-03-22 Thread Erick Erickson
Not unless you provide a lot more data. Have you
inspected the Solr logs and seen any anomalies?

Please review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Mon, Mar 21, 2011 at 3:56 PM, Matias Alonso matiasgalo...@gmail.com wrote:
 Hi,


 I’m using Data Import Handler for index emails.

 The problem is that nota ll the emails was indexed When I do a full import.

 Someone have any idea?


 Regards,

 --
 Matias.



Re: How to upgrade to Solr4.0 to use Result Grouping?

2011-03-22 Thread Erick Erickson
Awww, rats. Thanks Yonik, I keep getting this mixed up...

Erick

On Mon, Mar 21, 2011 at 2:57 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Mon, Mar 21, 2011 at 10:20 AM, Erick Erickson
 erickerick...@gmail.com wrote:
 Get the release and re-index? You can get a trunk
 version either through SVN or from the nightly build
 at https://builds.apache.org/hudson/view/S-Z/view/Solr/

 Note that 3.1 also has result grouping

 Result grouping / field collapsing is in trunk (4.0-dev) only.

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco



help with Solr installation within Tomcat7

2011-03-22 Thread ramdev.wudali
Hi All:
   I have just started using Solr and have it successfully installed within a 
Tomcat7 Webapp server.
I have also indexed documents using the SolrJ interfaces. The following is my 
problem:

I installed Solr under Tomcat7 folders and setup an xml configuration file to 
indicate the Solr home variables as detailed on the wiki (for Solr install 
within TOmcat)
The indexes seem to reside within the solr_home folder under the data folder  
(Solr_home/data/index )

However when I make a zip copy of the the complete install (i.e. tomcat with 
Solr), and move it to a different machine and unzip/install it,
The index seems to be inaccessible. (I did change the solr.xml configuration 
variables to point to the new location)

From what I know, with tomcat installations, it should be as simple as zipping 
a current working installation and unzipping/installing  on a different 
machine/location.

Am I missing something that makes Solr hardcode the path to the index in an 
install ?

Simple plut, I would like to know how to transport an existing install of 
Solr within TOmcat 7 from one machine to another and still have it working.

Ramdev=


Re: Adding the suggest component

2011-03-22 Thread Brian Lamb
Thanks everyone for the advice. I checked out a recent version from SVN and
ran:

ant clean example

This worked just fine. However when I went to start the solr server, I get
this error message:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.dataimport.DataImportHandler'

It looks like those files are there:

contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/

But for some reason, they aren't able to be found. Where would I update this
setting and what would I update it to?

Thanks,

Brian Lamb

On Mon, Mar 21, 2011 at 10:15 AM, Erick Erickson erickerick...@gmail.comwrote:

 OK, I think you're jumping ahead and trying to do
 too many things at once.

 What did you download? Source? The distro? The error
 you posted usually happens for me when I haven't
 compiled the example target from source. So I'd guess
 you don't have the proper targets built. This assumes you
 downloaded the source via SVN.

 If you downloaded a distro, I'd start by NOT copying anything
 anywhere, just go to the example code and start Solr. Make
 sure you have what you think you have.

 I've seen interesting things get cured by removing the entire
 directory where your servlet container unpacks war files, but
 that's usually in development environments.

 When I get in these situations, I usually find it's best to back
 up, do one thing at a time and verify that I get the expected
 results at each step. It's tedious, but

 Best
 Erick


 On Fri, Mar 18, 2011 at 4:18 PM, Ahmet Arslan iori...@yahoo.com wrote:
  downloaded a recent version and
there were the following files/folders:
   
build.xml
dev-tools
LICENSE.txt
lucene
NOTICE.txt
README.txt
solr
   
So I did cp -r solr/* /path/to/solr/stuff/ and
  started solr. I didn't get
any error message but I only got the following
  messages:
 
  How do you start solr? using java -jar start.jar? Did you run 'ant clean
 example' in the solr folder?
 
 
 
 



Re: Adding the suggest component

2011-03-22 Thread Brian Lamb
I found the following in the build.xml file:

invoke-javadoc destdir=${build.javadoc}
   sources
packageset dir=${src}/common /
packageset dir=${src}/solrj /
packageset dir=${src}/java /
packageset dir=${src}/webapp/src /
packageset dir=contrib/dataimporthandler/src/main/java /
packageset dir=contrib/clustering/src/main/java /
packageset dir=contrib/extraction/src/main/java /
packageset dir=contrib/uima/src/main/java /
packageset dir=contrib/analysis-extras/src/java /
group title=Core packages=org.apache.* /
group title=Common packages=org.apache.solr.common.* /
group title=SolrJ packages=org.apache.solr.client.solrj* /
group title=contrib: DataImportHandler
packages=org.apache.solr.handler.dataimport* /
group title=contrib: Clustering
packages=org.apache.solr.handler.clustering* /
group title=contrib: Solr Cell
packages=org.apache.solr.handler.extraction* /
group title=contrib: Solr UIMA packages=org.apache.solr.uima* /
  /sources
/invoke-javadoc

It looks like the dataimport handler path is correct in there so I don't
understand why it's not being compile.

I ran ant example again today but I'm still getting the same error.

Thanks,

Brian Lamb

On Tue, Mar 22, 2011 at 11:28 AM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Thanks everyone for the advice. I checked out a recent version from SVN and
 ran:

 ant clean example

 This worked just fine. However when I went to start the solr server, I get
 this error message:

 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.dataimport.DataImportHandler'

 It looks like those files are there:

 contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/

 But for some reason, they aren't able to be found. Where would I update
 this setting and what would I update it to?

 Thanks,

 Brian Lamb

 On Mon, Mar 21, 2011 at 10:15 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 OK, I think you're jumping ahead and trying to do
 too many things at once.

 What did you download? Source? The distro? The error
 you posted usually happens for me when I haven't
 compiled the example target from source. So I'd guess
 you don't have the proper targets built. This assumes you
 downloaded the source via SVN.

 If you downloaded a distro, I'd start by NOT copying anything
 anywhere, just go to the example code and start Solr. Make
 sure you have what you think you have.

 I've seen interesting things get cured by removing the entire
 directory where your servlet container unpacks war files, but
 that's usually in development environments.

 When I get in these situations, I usually find it's best to back
 up, do one thing at a time and verify that I get the expected
 results at each step. It's tedious, but

 Best
 Erick


 On Fri, Mar 18, 2011 at 4:18 PM, Ahmet Arslan iori...@yahoo.com wrote:
  downloaded a recent version and
there were the following files/folders:
   
build.xml
dev-tools
LICENSE.txt
lucene
NOTICE.txt
README.txt
solr
   
So I did cp -r solr/* /path/to/solr/stuff/ and
  started solr. I didn't get
any error message but I only got the following
  messages:
 
  How do you start solr? using java -jar start.jar? Did you run 'ant clean
 example' in the solr folder?
 
 
 
 





Re: Adding the suggest component

2011-03-22 Thread Ahmet Arslan

--- On Tue, 3/22/11, Brian Lamb brian.l...@journalexperts.com wrote:

 From: Brian Lamb brian.l...@journalexperts.com
 Subject: Re: Adding the suggest component
 To: solr-user@lucene.apache.org
 Cc: Erick Erickson erickerick...@gmail.com
 Date: Tuesday, March 22, 2011, 5:28 PM
 Thanks everyone for the advice. I
 checked out a recent version from SVN and
 ran:
 
 ant clean example
 
 This worked just fine. However when I went to start the
 solr server, I get
 this error message:
 
 SEVERE: org.apache.solr.common.SolrException: Error loading
 class
 'org.apache.solr.handler.dataimport.DataImportHandler'

run 'ant clean dist' and copy trunk/solr/dist/

apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar
apache-solr-dataimporthandler-4.0-SNAPSHOT.jar

to solrHome/lib directory.





  


Re: email - DIH

2011-03-22 Thread Matias Alonso
Thank you very much for your answer Erick.


My apologies for the previous email; my problem is that I don´t speak
English very well and I´m new in the world of mailing list.


The problem is that I´m indexing emails throw Data import Handler using
Gmail with imaps; I do this for search on email list in the future. The
emails are indexed partiality and I can´t found the problem of why don´t
index all of the emails.



Below I show you de configuration of my DIH.


dataConfig

document

   entity

   name=gmail


processor=MailEntityProcessor

   transformer=LogTransformer

   user=em...@gmail.com

   password=password

   host=imap.gmail.com

   protocol=imaps

   fetchMailsSince=2010-01-01
00:00:00

   folders=inbox

   deltaFetch=false

   processAttachement=false

   batchSize=100

   fetchSize=1024

   recurse=true /

/document

/dataConfig



The date of my emails is later to “2010-01-01 00:00:00”.




I´ve done a full import and no errors were found, but in the status I saw
that was added 28 documents, and in the console, I found 35 messanges.

Below I show you the status screen, first, and then part of the console
output.



Status:

response

lst name=responseHeader

int name=status0/int

int name=QTime1/int

/lst

lst name=initArgs

lst name=defaults

str name=configdata-config.xml/str

/lst

/lst

str name=commandstatus/str

str name=statusidle/str

str name=importResponse/

lst name=statusMessages

str name=Total Requests made to DataSource0/str

str name=Total Rows Fetched28/str

str name=Total Documents Skipped0/str

str name=Full Dump Started2011-03-22 15:55:12/str

str name=

Indexing completed. Added/Updated: 28 documents. Deleted 0 documents.

/str

str name=Committed2011-03-22 15:55:20/str

str name=Optimized2011-03-22 15:55:20/str

str name=Total Documents Processed28/str

str name=Time taken 0:0:8.520/str

/lst

str name=WARNING

This response format is experimental.  It is likely to change in the future.

/str

/response



…”

Mar 22, 2011 3:55:14 PM
org.apache.solr.handler.dataimport.MailEntityProcessor connectToMailBox

INFO: Connected to mailbox

Mar 22, 2011 3:55:15 PM
org.apache.solr.handler.dataimport.MailEntityProcessor$FolderIterator next

INFO: Opened folder : inbox

Mar 22, 2011 3:55:15 PM
org.apache.solr.handler.dataimport.MailEntityProcessor$FolderIterator next

INFO: Added its children to list  :

Mar 22, 2011 3:55:15 PM
org.apache.solr.handler.dataimport.MailEntityProcessor$FolderIterator next

INFO: NO children :

Mar 22, 2011 3:55:16 PM
org.apache.solr.handler.dataimport.MailEntityProcessor$MessageIterator
init

INFO: Total messages : 35

Mar 22, 2011 3:55:16 PM
org.apache.solr.handler.dataimport.MailEntityProcessor$MessageIterator
init

INFO: Search criteria applied. Batching disabled

Mar 22, 2011 3:55:19 PM org.apache.solr.handler.dataimport.DocBuilder finish

INFO: Import completed successfully

“…



Regards,

Matias.





2011/3/22 Erick Erickson erickerick...@gmail.com

 Not unless you provide a lot more data. Have you
 inspected the Solr logs and seen any anomalies?

 Please review:
 http://wiki.apache.org/solr/UsingMailingLists

 Best
 Erick

 On Mon, Mar 21, 2011 at 3:56 PM, Matias Alonso matiasgalo...@gmail.com
 wrote:
  Hi,
 
 
  I’m using Data Import Handler for index emails.
 
  The problem is that nota ll the emails was indexed When I do a full
 import.
 
  Someone have any idea?
 
 
  Regards,
 
  --
  Matias.
 



Re: Adding the suggest component

2011-03-22 Thread Brian Lamb
Awesome! That fixed that problem. I'm getting another class not found error
but I'll see if I can fix it on my own first.

On Tue, Mar 22, 2011 at 11:56 AM, Ahmet Arslan iori...@yahoo.com wrote:


 --- On Tue, 3/22/11, Brian Lamb brian.l...@journalexperts.com wrote:

  From: Brian Lamb brian.l...@journalexperts.com
  Subject: Re: Adding the suggest component
  To: solr-user@lucene.apache.org
  Cc: Erick Erickson erickerick...@gmail.com
  Date: Tuesday, March 22, 2011, 5:28 PM
  Thanks everyone for the advice. I
  checked out a recent version from SVN and
  ran:
 
  ant clean example
 
  This worked just fine. However when I went to start the
  solr server, I get
  this error message:
 
  SEVERE: org.apache.solr.common.SolrException: Error loading
  class
  'org.apache.solr.handler.dataimport.DataImportHandler'

 run 'ant clean dist' and copy trunk/solr/dist/

 apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar
 apache-solr-dataimporthandler-4.0-SNAPSHOT.jar

 to solrHome/lib directory.









Re: Adding the suggest component

2011-03-22 Thread Brian Lamb
I fixed a few other exceptions it threw when I started the server but I
don't know how to fix this one:

java.lang.NoClassDefFoundError: Could not initialize class
org.apache.solr.handler.dataimport.DataImportHandler
at java.lang.Class.forName0(Native Method)

java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at
org.apache.solr.handler.dataimport.DataImportHandler.clinit(DataImportHandler.java:72)

Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

I've searched Google but haven't been able to find a reason why this happens
and how to fix it.

Thanks,

Brian Lamb

On Tue, Mar 22, 2011 at 12:54 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Awesome! That fixed that problem. I'm getting another class not found error
 but I'll see if I can fix it on my own first.


 On Tue, Mar 22, 2011 at 11:56 AM, Ahmet Arslan iori...@yahoo.com wrote:


 --- On Tue, 3/22/11, Brian Lamb brian.l...@journalexperts.com wrote:

  From: Brian Lamb brian.l...@journalexperts.com
  Subject: Re: Adding the suggest component
  To: solr-user@lucene.apache.org
  Cc: Erick Erickson erickerick...@gmail.com
  Date: Tuesday, March 22, 2011, 5:28 PM
  Thanks everyone for the advice. I
  checked out a recent version from SVN and
  ran:
 
  ant clean example
 
  This worked just fine. However when I went to start the
  solr server, I get
  this error message:
 
  SEVERE: org.apache.solr.common.SolrException: Error loading
  class
  'org.apache.solr.handler.dataimport.DataImportHandler'

 run 'ant clean dist' and copy trunk/solr/dist/

 apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar
 apache-solr-dataimporthandler-4.0-SNAPSHOT.jar

 to solrHome/lib directory.










Solr 1.4.1 and Tika 0.9 - some tests not passing

2011-03-22 Thread Andreas Kemkes
Due to some PDF indexing issues with the Solr 1.4.1 distribution, we would like 
to upgrade it to Tika 0.9, as the issues are not occurring in Tika 0.9.

With the changes we made to Solr 1.4.1, we can successfully index the 
previously 
failing PDF documents.

Unfortunately we cannot get the HTML-related tests to pass.

The following asserts in ExtractingRequestHandlerTest.java are failing:

assertQ(req(title:Welcome), //*[@numFound='1']);
assertQ(req(+id:simple2 +t_href:[* TO *]), //*[@numFound='1']);
assertQ(req(t_href:http), //*[@numFound='2']);
assertQ(req(t_href:http), //doc[1]/str[.='simple3']);
assertQ(req(+id:simple4 +t_content:Solr), //*[@numFound='1']);
assertQ(req(defaultExtr:http\\://www.apache.org), //*[@numFound='1']);
assertQ(req(+id:simple2 +t_href:[* TO *]), //*[@numFound='1']);
assertTrue(val +  is not equal to  + linkNews, val.equals(linkNews) == 
true);//there are two a tags, and they get collapesd

Below are the differences in output from Tika 0.4 and Tika 0.9 for simple.html.

Tika 0.9 has additional meta tags, a shape attribute, and some additional white 
space.  Is this what throws it off?  

What do we need to consider so that Solr 1.4.1 will process the Tika 0.9 output 
correctly?

Do we need to configure different filters and tokenizers?  Which ones?

Or is it something else entirely?

Thanks in advance for any help,

Andreas

$ java -jar tika-app-0.4.jar 
../../../apache-solr-1.4.1-with-tika-0.9/contrib/extraction/src/test/resources/simple.html

?xml version=1.0 encoding=UTF-8?
head
titleWelcome to Solr/title
/head
body
p
  Here is some text
/p

Here is some text in a div
This has a link'http://www.apache.org;link.


/body
/html

$ java -jar tika-app-0.9.jar 
../../../apache-solr-1.4.1-with-tika-0.9/contrib/extraction/src/test/resources/simple.html
 

?xml version=1.0 encoding=UTF-8?
head
meta name=Content-Length content=209/
meta name=Content-Encoding content=ISO-8859-1/
meta name=Content-Type content=text/html/
meta name=resourceName content=simple.html/
titleWelcome to Solr/title
/head
body
p
  Here is some text
/p

Here is some text in a div

This has a link'http://www.apache.org;link.

/body
/html



  

Re: Adding the suggest component

2011-03-22 Thread Ahmet Arslan
 java.lang.NoClassDefFoundError: Could not initialize class
 org.apache.solr.handler.dataimport.DataImportHandler
 at java.lang.Class.forName0(Native Method)
 
 java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
 at
 org.apache.solr.handler.dataimport.DataImportHandler.clinit(DataImportHandler.java:72)
 
 Caused by: java.lang.ClassNotFoundException:
 org.slf4j.LoggerFactory
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 

You can find slf4j- related jars in \trunk\solr\lib, but this error is weird.


  


Architecture question about solr sharding

2011-03-22 Thread JohnRodey
I have an issue and I'm wondering if there is an easy way around it with just
SOLR.

I have multiple SOLR servers and a field in my schema is a relative path to
a binary file.  Each SOLR server is responsible for a different subset of
data that belongs to a different base path.

For Example...

My directory structure may look like this:
/someDir/Jan/binaryfiles/...
/someDir/Feb/binaryfiles/...
/someDir/Mar/binaryfiles/...
/someDir/Apr/binaryfiles/...

Server1 is responsible for Jan, Server2 for Feb, etc...

And a response document may have a field like this
my entry
binaryfiles/12345.bin

How can I tell from my main search server which server returned a result?
I cannot put the full path in the index because my path structure might
change in the future.  Using this example it may go to '/someDir/Jan2011/'.

I basically need to find a way to say 'Ah! server01 returned this result, so
it must be in /someDir/Jan'

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Architecture-question-about-solr-sharding-tp2716417p2716417.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding the suggest component

2011-03-22 Thread Brian Lamb
That fixed that error as well as the could not initialize Dataimport class
error. Now I'm getting:

org.apache.solr.common.SolrException: Error Instantiating Request Handler,
org.apache.solr.handler.dataimport.DataImportHandler is not a
org.apache.solr.request.SolrRequestHandler

I can't find anything on this one. What I've added to the solrconfig.xml
file matches whats in example-DIH so I don't quite understand what the issue
is here. It sounds to me like it is not declared properly somewhere but I'm
not sure where/why.

Here is the relevant portion of my solrconfig.xml file:

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
   lst name=defaults
 str name=configdb-data-config.xml/str
   /lst
/requestHandler

Thanks for all the help so far. You all have been great.

Brian Lamb

On Tue, Mar 22, 2011 at 3:17 PM, Ahmet Arslan iori...@yahoo.com wrote:

  java.lang.NoClassDefFoundError: Could not initialize class
  org.apache.solr.handler.dataimport.DataImportHandler
  at java.lang.Class.forName0(Native Method)
 
  java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
  at
 
 org.apache.solr.handler.dataimport.DataImportHandler.clinit(DataImportHandler.java:72)
 
  Caused by: java.lang.ClassNotFoundException:
  org.slf4j.LoggerFactory
  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 

 You can find slf4j- related jars in \trunk\solr\lib, but this error is
 weird.






Re: Solr performance issue

2011-03-22 Thread Alexey Serba
 Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
 to 8gb every 20 seconds or so,
 gc runs, falls down to 1gb.

Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot.

Do you return all results (ids) for your queries? Any tricky
faceting/sorting/function queries?


Re: help with Solr installation within Tomcat7

2011-03-22 Thread Erick Erickson
What error are you receiving? Check your config files for any
absolute rather than relative paths would be my first guess...

Best
Erick

On Tue, Mar 22, 2011 at 10:09 AM,  ramdev.wud...@thomsonreuters.com wrote:
 Hi All:
   I have just started using Solr and have it successfully installed within a 
 Tomcat7 Webapp server.
 I have also indexed documents using the SolrJ interfaces. The following is my 
 problem:

 I installed Solr under Tomcat7 folders and setup an xml configuration file to 
 indicate the Solr home variables as detailed on the wiki (for Solr install 
 within TOmcat)
 The indexes seem to reside within the solr_home folder under the data folder  
 (Solr_home/data/index )

 However when I make a zip copy of the the complete install (i.e. tomcat with 
 Solr), and move it to a different machine and unzip/install it,
 The index seems to be inaccessible. (I did change the solr.xml configuration 
 variables to point to the new location)

 From what I know, with tomcat installations, it should be as simple as 
 zipping a current working installation and unzipping/installing  on a 
 different machine/location.

 Am I missing something that makes Solr hardcode the path to the index in an 
 install ?

 Simple plut, I would like to know how to transport an existing install of 
 Solr within TOmcat 7 from one machine to another and still have it working.

 Ramdev=



Re: Architecture question about solr sharding

2011-03-22 Thread Erick Erickson
I'd just put the data in the document. That way, you're not
inferring anything, you *know* which shard (or even the
logical shard) the data came from.

Does that make sense in your problem sace?

Erick

On Tue, Mar 22, 2011 at 3:20 PM, JohnRodey timothydd...@yahoo.com wrote:
 I have an issue and I'm wondering if there is an easy way around it with just
 SOLR.

 I have multiple SOLR servers and a field in my schema is a relative path to
 a binary file.  Each SOLR server is responsible for a different subset of
 data that belongs to a different base path.

 For Example...

 My directory structure may look like this:
 /someDir/Jan/binaryfiles/...
 /someDir/Feb/binaryfiles/...
 /someDir/Mar/binaryfiles/...
 /someDir/Apr/binaryfiles/...

 Server1 is responsible for Jan, Server2 for Feb, etc...

 And a response document may have a field like this
 my entry
 binaryfiles/12345.bin

 How can I tell from my main search server which server returned a result?
 I cannot put the full path in the index because my path structure might
 change in the future.  Using this example it may go to '/someDir/Jan2011/'.

 I basically need to find a way to say 'Ah! server01 returned this result, so
 it must be in /someDir/Jan'

 Thanks!

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Architecture-question-about-solr-sharding-tp2716417p2716417.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: help with Solr installation within Tomcat7

2011-03-22 Thread Ezequiel Calderara
Where is your solr files (war, conf files) located? How did you instance
solr in tomcat?

On Tue, Mar 22, 2011 at 7:08 PM, Erick Erickson erickerick...@gmail.comwrote:

 What error are you receiving? Check your config files for any
 absolute rather than relative paths would be my first guess...

 Best
 Erick

 On Tue, Mar 22, 2011 at 10:09 AM,  ramdev.wud...@thomsonreuters.com
 wrote:
  Hi All:
I have just started using Solr and have it successfully installed
 within a Tomcat7 Webapp server.
  I have also indexed documents using the SolrJ interfaces. The following
 is my problem:
 
  I installed Solr under Tomcat7 folders and setup an xml configuration
 file to indicate the Solr home variables as detailed on the wiki (for Solr
 install within TOmcat)
  The indexes seem to reside within the solr_home folder under the data
 folder  (Solr_home/data/index )
 
  However when I make a zip copy of the the complete install (i.e. tomcat
 with Solr), and move it to a different machine and unzip/install it,
  The index seems to be inaccessible. (I did change the solr.xml
 configuration variables to point to the new location)
 
  From what I know, with tomcat installations, it should be as simple as
 zipping a current working installation and unzipping/installing  on a
 different machine/location.
 
  Am I missing something that makes Solr hardcode the path to the index
 in an install ?
 
  Simple plut, I would like to know how to transport an existing install
 of Solr within TOmcat 7 from one machine to another and still have it
 working.
 
  Ramdev=
 




-- 
__
Ezequiel.

Http://www.ironicnet.com


Multiple Cores with Solr Cell for indexing documents

2011-03-22 Thread Brandon Waterloo
Hello everyone,

I've been trying for several hours now to set up Solr with multiple cores with 
Solr Cell working on each core.  The only items being indexed are PDF, DOC, and 
TXT files (with the possibility of expanding this list, but for now, just 
assume the only things in the index should be documents).

I never had any problems with Solr Cell when I was using a single core.  In 
fact, I just ran the default installation in example/ and worked from that.  
However, trying to migrate to multi-core has been a never ending list of 
problems.

Any time I try to add a document to the index (using the same curl command as I 
did to add to the single core, of course adding the core name to the request 
URL-- host/solr/corename/update/extract...), I get HTTP 500 errors due to 
classes not being found and/or lazy loading errors.  I've copied the exact 
example/lib directory into the cores, and that doesn't work either.

Frankly the only libraries I want are those relevant to indexing files.  The 
less bloat, the better, after all.  However, I cannot figure out where to put 
what files, and why the example installation works perfectly for single-core 
but not with multi-cores.

Here is an example of the errors I'm receiving:

command prompt curl 
host/solr/core0/update/extract?literal.id=2-3-1commit=true -F 
myfile=@test2.txt

html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 500 /title
/head
bodyh2HTTP ERROR: 500/h2preorg/apache/tika/exception/TikaException

java.lang.NoClassDefFoundError: org/apache/tika/exception/TikaException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: java.lang.ClassNotFoundException: 
org.apache.tika.exception.TikaException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 27 more
/pre
pRequestURI=/solr/core0/update/extract/ppismalla 
href=http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/

/body
/html

Any assistance you could provide or installation guides/tutorials/etc. that you 
could link me to would be greatly appreciated.  Thank you all for your time!

~Brandon Waterloo



Re: email - DIH

2011-03-22 Thread Gora Mohanty
On Tue, Mar 22, 2011 at 9:38 PM, Matias Alonso matiasgalo...@gmail.com wrote:
[...]
 The problem is that I´m indexing emails throw Data import Handler using
 Gmail with imaps; I do this for search on email list in the future. The
 emails are indexed partiality and I can´t found the problem of why don´t
 index all of the emails.
[...]
 I´ve done a full import and no errors were found, but in the status I saw
 that was added 28 documents, and in the console, I found 35 messanges.
[...]

 INFO: Total messages : 35

 Mar 22, 2011 3:55:16 PM
 org.apache.solr.handler.dataimport.MailEntityProcessor$MessageIterator
 init

 INFO: Search criteria applied. Batching disabled
[...]

The above seems to indicate that the MailEntityProcessor does find
all 35 messages, but indexes only 28. Are you sure that all 35 are
since 2010-01-01 00:00:00? Could you try without fetchMailsSince?

Regards,
Gora