Re: Indexing and querying BLOBS stored in Mysql

2012-08-24 Thread Alexey Serba
I would recommend to create a simple data import handler to test tika
parsing for large BLOBs, i.e. remove not related entities, remove all
the configuration for delta imports and keep just entity that
retrieves blobs and entity that parses binary content
(fieldReader/TikaEntityProcessor).

Some comments:
1. Maybe you are running delta import and there are not new records in database?
2. deltaQuery should only return id-s and not other columns/data,
because you don't use them in deltaQueryImport (see
dataimporter.delta.id )
3. not all entities have HTMLStripTransformer in a transformers list,
but use them in fields. TemplateTransformer is not used at all.

   entity name=aitiologikes_ektheseis
 dataSource=db
 transformer=HTMLStripTransformer
 query=select id, title, title AS grid_title, model, type, url,
 last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, CONCAT(
 body,' ',title)  AS content from aitiologikes_ektheseis where type = 'text'
 deltaImportQuery=select id, title, title AS grid_title, model, type, 
 url,
 last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, CONCAT(
 body,' ',title)  AS content from aitiologikes_ektheseis where type = 'text'
 and id='${dataimporter.delta.id}'
 deltaQuery=select id, title, title AS grid_title, model, type, url,
 last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, CONCAT(
 body,' ',title)  AS content from aitiologikes_ektheseis where type = 'text'
 and last_modified  '${dataimporter.last_index_time}'
 field column=id name=ida /
 field column=solr_id name=solr_id /
 field column=title name=title stripHTML=true /
 field column=grid_title name=grid_title stripHTML=true 
 /
 field column=model name=model stripHTML=true /
 field column=type name=type stripHTML=true /
 field column=url name=url stripHTML=true /
 field column=last_modified name=last_modified 
 stripHTML=true  /
 field column=search_tag name=search_tag stripHTML=true 
 /
 field column=content name=content stripHTML=true /
 /entity

 entity name=aitiologikes_ektheseis_bin
   query=select id, title, title AS grid_title, model, type, url,
 last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, bin_con AS
 text from aitiologikes_ektheseis where type = 'bin'
   deltaImportQuery=select id, title, title AS grid_title, model, 
 type,
 url, last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, bin_con
 AS text from aitiologikes_ektheseis where type = 'bin' and
 id='${dataimporter.delta.id}'
   deltaQuery=select id, title, title AS grid_title, model, type, url,
 last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, bin_con AS
 text from aitiologikes_ektheseis where type = 'bin' and last_modified 
 '${dataimporter.last_index_time}'
   transformer=TemplateTransformer
   dataSource=db

   field column=id name=ida /
 field column=solr_id name=solr_id /
   field column=title name=title stripHTML=true /
   field column=grid_title name=grid_title 
 stripHTML=true /
   field column=model name=model stripHTML=true /
   field column=type name=type stripHTML=true /
   field column=url name=url stripHTML=true /
   field column=last_modified name=last_modified 
 stripHTML=true  /
   field column=search_tag name=search_tag 
 stripHTML=true /

 entity dataSource=fieldReader 
 processor=TikaEntityProcessor
 dataField=aitiologikes_ektheseis_bin.text format=text
   field column=text name=contentbin stripHTML=true /
 /entity

 /entity

 ...
 ...
 /document

 /dataConfig

 *A portion from schema.xml (the fieldTypes and filed definition):*

 fieldType name=text_ktimatologio class=solr.TextField
 positionIncrementGap=100

   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_en.txt enablePositionIncrements=true/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPossessiveFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_el.txt enablePositionIncrements=true/
 filter class=solr.GreekLowerCaseFilterFactory/
 filter class=solr.GreekStemFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
   /analyzer

   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter 

Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

2012-08-24 Thread Vadim Kisselmann
a presumption:
do you use your old solrconfig.xml files from older installations?
when yes, compare the default config and yours.


2012/8/23 Claudio Ranieri claudio.rani...@estadao.com:
 I made this instalation on a new tomcat.
 With Solr 3.4.*, 3.5.*, 3.6.* works with jars into 
 $TOMCAT_HOME/webapps/solr/WEB-INF/lib, but with solr 4.0 beta doesn´t work. I 
 needed to add the jars into $TOMCAT_HOME/lib.
 The problem with the cast seems to be in the source code.


 -Mensagem original-
 De: Karthick Duraisamy Soundararaj [mailto:karthick.soundara...@gmail.com]
 Enviada em: quinta-feira, 23 de agosto de 2012 09:22
 Para: solr-user@lucene.apache.org
 Assunto: Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

 Not sure if this can help. But once I had a similar problem with Solr 3.6.0 
 where tomcat refused to find one of the classes that existed. I deleted the 
 tomcat's webapp directory and then it worked fine.

 On Thu, Aug 23, 2012 at 8:19 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 First, I'm no Tomcat expert here's the Tomcat Solr page, but
 you've probably already seen it:
 http://wiki.apache.org/solr/SolrTomcat

 But I'm guessing that you may have old jars around somewhere and
 things are getting confused. I'd blow away the whole thing and start
 over, whenever I start copying jars around I always lose track of
 what's where.

 Have you successfully had any other Solr operate under Tomcat?

 Sorry I can't be more help
 Erick

 On Wed, Aug 22, 2012 at 9:47 AM, Claudio Ranieri
 claudio.rani...@estadao.com wrote:
  Hi,
 
  I tried to start the solr-4.0.0-BETA with tomcat-6.0.20 but does not
 work.
  I copied the apache-solr-4.0.0-BETA.war to $TOMCAT_HOME/webapps.
  Then I
 copied the directory apache-solr-4.0.0-BETA\example\solr to
 C:\home\solr-4.0-beta and adjusted the file
 $TOMCAT_HOME\conf\Catalina\localhost\apache-solr-4.0.0-BETA.xml to
 point the solr/home to C:/home/solr-4.0-beta. With this configuration,
 when I startup tomcat I got:
 
  SEVERE: org.apache.solr.common.SolrException: Invalid
  luceneMatchVersion
 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22,
 LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_31, LUCENE_32,
 LUCENE_33, LUCENE_34, LUCENE_35, LUCENE_36, LUCENE_CURRENT ] or a string in 
 format 'VV'
 
  So I changed the line in solrconfig.xml:
 
  luceneMatchVersionLUCENE_40/luceneMatchVersion
 
  to
 
  luceneMatchVersionLUCENE_CURRENT/luceneMatchVersion
 
  So I got a new error:
 
  Caused by: java.lang.ClassNotFoundException:
 solr.NRTCachingDirectoryFactory
 
  This class is within the file apache-solr-core-4.0.0-BETA.jar but
  for
 some reason classloader of the class is not loaded. I then moved all
 jars in $TOMCAT_HOME\webapps\apache-solr-4.0.0-BETA\WEB-INF\lib to
 $TOMCAT_HOME\lib.
  After this setup, I got a new error:
 
  SEVERE: java.lang.ClassCastException:
 org.apache.solr.core.NRTCachingDirectoryFactory can not be cast to
 org.apache.solr.core.DirectoryFactory
 
  So I changed the line in solrconfig.xml:
 
  directoryFactory name=DirectoryFactory
 
 class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/
 
  to
 
  directoryFactory name=DirectoryFactory
 
 class=${solr.directoryFactory:solr.NIOFSDirectoryFactory}/
 
  So I got a new error:
 
  Caused by: java.lang.ClassCastException:
 org.apache.solr.spelling.DirectSolrSpellChecker can not be cast to
 org.apache.solr.spelling.SolrSpellChecker
 
  How can I resolve the problem of classloader?
  How can I resolve the problem of cast of NRTCachingDirectoryFactory
  and
 DirectSolrSpellChecker?
  I can not startup the solr 4.0 beta with tomcat.
  Thanks,
 
 
 
 




 --
 --
 Karthick D S
 Master's in Computer Engineering ( Software Track ) Syracuse University 
 Syracuse - 13210 New York United States of America


Unmatched quotes

2012-08-24 Thread Peter Kirk
Hi,

If I execute the following query, with unmatched quotes, I get an error from 
Solr - as I haven't escaped the middle .

But the error message appears to simply be 400 null. Is it possible to get 
Solr to return a more informative error message?

http://myhost/solr/myapp/select?q=title:cycle with 24 wheels

Thanks,
Peter



Re: Solr - Unique Key Field Should Apply on q search or fq search

2012-08-24 Thread Ahmet Arslan
 For. e. g. if i search with below url, then it return
 results me as 0 rows ,
 where as such record exist.
 
 http://localhost:8080/solr/core0/select?q=myTextFeild:politics
 programme AND
 myuniquekey:193834
 
 but if i modify my search with any of below mentioned search
 query it works
 properly.
 
 http://localhost:8080/solr/core0/select?q=myuniquekey:193834
 AND
 myTextFeild:politics programme
 
 OR 
 
 http://localhost:8080/solr/core0/select?q=myTextFeild:politics
 programmefq=myuniquekey:193834
 
 Now i don't know which would be better option to apply ,
 should i apply
 unique key on query or in filter query


myTextFeild:politics programme is parsed as follows :

myTextFeild:politics defaultField:programme

You should use parenthesis : 
q=myTextFeild:(politics programme) AND myuniquekey:193834

Filter queries are cached, if you will be re-using same uniqueKey it is better 
to use fq.



Re: Solr 4.0 Beta missing example/conf files?

2012-08-24 Thread Lance Norskog
bin/ usually goes in the collection/ directory, but nobody uses the
programs in bin/. They are all for the old rsync replicator.

lib/ can go next to solr.xml, or in a collection. In the top
directory, lib/ jars are visible to all collections. Inside a
collection, lib/ jars are only visible to that collection.

On Thu, Aug 23, 2012 at 8:17 PM, Tom Burton-West tburt...@umich.edu wrote:
 Thanks Erik!

 What confused me in the README is that it wasn't clear what
 files/directorys need to be in Solr home and what files/directories need to
 be in SolrHome/corename.  For example the /conf and /data directories are
 now under the core subdirectory.  What about /lib and /bin?   Will a core
 use a conf file in SolrHome/conf if there is no Solrhome/collection1/conf
 directory?

 Also when upgrading from a previous Solr setup that doesn't use a core,  I
 was definitely confused about whether or not it is mandatory to have core
 with Solr 4.0.  And when I tried not using a solr.xml file, it was very
 wierd to still get a message about a missing collection1 core directory.

 See this JIRA issue:https://issues.apache.org/jira/browse/SOLR-3753

 Tom


 On Thu, Aug 23, 2012 at 7:56 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 Tom -

 I corrected, on both trunk and 4_x, a reference to solr/conf (to
 solr/collection1/conf) in tutorial.html.  I didn't see anything in
 example/README that needed fixing.  Was there something that is awry there
 that needs correcting that I missed?   If so, feel free to file a JIRA
 marked for 4.0 so we can be sure to fix it before final release.

 Thanks,
 Erik

 On Aug 22, 2012, at 16:32 , Tom Burton-West wrote:

  Thanks Markus!
 
  Should the README.txt file in solr/example be updated to reflect this?
  Is that something I need to enter a JIRA issue for?
 
  Tom
 
  On Wed, Aug 22, 2012 at 3:12 PM, Markus Jelsma
  markus.jel...@openindex.iowrote:
 
  Hi - The example has been moved to collection1/
 
 
 
  -Original message-
  From:Tom Burton-West tburt...@umich.edu
  Sent: Wed 22-Aug-2012 20:59
  To: solr-user@lucene.apache.org
  Subject: Solr 4.0 Beta missing example/conf files?
 
  Hello,
 
  Usually in the example/solr file in Solr distributions there is a
  populated
  conf file.  However in the distribution I downloaded of solr
 4.0.0-BETA,
  there is no /conf directory.   Has this been moved somewhere?
 
  Tom
 
  ls -l apache-solr-4.0.0-BETA/example/solr
  total 107
  drwxr-sr-x 2 tburtonw dlps0 May 29 13:02 bin
  drwxr-sr-x 3 tburtonw dlps   22 Jun 28 09:21 collection1
  -rw-r--r-- 1 tburtonw dlps 2259 May 29 13:02 README.txt
  -rw-r--r-- 1 tburtonw dlps 2171 Jul 31 19:35 solr.xml
  -rw-r--r-- 1 tburtonw dlps  501 May 29 13:02 zoo.cfg
 
 





-- 
Lance Norskog
goks...@gmail.com


Re: Unmatched quotes

2012-08-24 Thread Ahmet Arslan
 If I execute the following query, with unmatched quotes, I
 get an error from Solr - as I haven't escaped the middle .
 
 But the error message appears to simply be 400 null. Is it
 possible to get Solr to return a more informative error
 message?
 
 http://myhost/solr/myapp/select?q=title:cycle with 24
 wheels

I don't know about the error message but (e)disxmax query parser strips 
unbalanced quotes.



Re: Data Import Handler - Could not load driver - com.microsoft.sqlserver.jdbc.SQLServerDriver - SOLR 4 Beta

2012-08-24 Thread Lance Norskog
Does this class exist in the driver jar?
com.microsoft.sqlserver.jdbc.SQLServerDriver

On Thu, Aug 23, 2012 at 9:09 AM, awb3667 adam.bu...@peopleclick.com wrote:
 Hello,

 I was able to get the DIH working in SOLR 3.6.1 (placed the sqljdbc4.jar
 file in the lib directory, etc). Everything worked great. Tried to get
 everything working in SOLR 4 beta (on the same dev machine connecting to
 same db, etc) and was unable to due to the sql driver not loading.

 What i've done:
 1. SOLR 4 admin comes up fine(configured solrconfig.xml and schema.xml)
 2. Dropped the sqljdbc4.jar in the lib directory
 3. Added sqljdbc4.jar to classpath
 4. Added dataimporthandler to solrconfig.xml:
 lib dir=../../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar /
 lib dir=../../../contrib/dataimporthandler/lib/ regex=.*\.jar /

 5. Even tried jtds which also gave me errors that the driver could not be
 loaded.

 Here is my datasource in the data-config.xml (DIH config file):
 dataSource name=db
 type=JdbcDataSource
 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 url=jdbc:sqlserver://DBSERVERNAME;instanceName=INST1;user=solr;password=password;applicationName=solr-DIH;databaseName=scratch
 user=solr password=password/


 Here is the error i get when trying to use jdbc connector:
 SEVERE: Full Import failed:java.lang.RuntimeException:
 java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
 load driver: com.microsoft.sqlserver.jdbc.SQLServerDriver Processing
 Document # 1
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
 at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
 at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
 at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
 Caused by: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
 load driver: com.microsoft.sqlserver.jdbc.SQLServerDriver Processing
 Document # 1
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
 at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
 ... 3 more
 Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
 Could not load driver: com.microsoft.sqlserver.jdbc.SQLServerDriver
 Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:114)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:62)
 at
 org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:354)
 at
 org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:99)
 at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.init(SqlEntityProcessor.java:53)
 at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:74)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:430)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
 ... 5 more
 Caused by: java.lang.ClassNotFoundException: Unable to load
 com.microsoft.sqlserver.jdbc.SQLServerDriver or
 org.apache.solr.handler.dataimport.com.microsoft.sqlserver.jdbc.SQLServerDriver
 at
 org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:899)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:112)
 ... 12 more
 Caused by: org.apache.solr.common.SolrException: Error loading class
 'com.microsoft.sqlserver.jdbc.SQLServerDriver'
 at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:438)
 at
 org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:889)
 ... 13 more
 Caused by: java.lang.ClassNotFoundException:
 com.microsoft.sqlserver.jdbc.SQLServerDriver
 at java.net.URLClassLoader$1.run(Unknown Source)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Unknown Source)
 at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:422)
 ... 14 more



 --
 View this message in context: 
 

Re: Indexing and querying BLOBS stored in Mysql

2012-08-24 Thread Alexandre Rafalovitch
I think it would greatly help if you say specifically where you are
stuck. Otherwise, there are too many directions to guess. The
configuration snippet you have is a little too large to 'parse'.

I believe DataImportHandler has some definition for nested processors,
have you tried using those and having problems?

Do you want extra custom processing for the blobs? Have you tried
writing a CustomProcessor that will call Tika and parse the content
and add it to the record? I am doing this to merge files in filesystem
with metadata records during index (for a test). If that sounds
similar to what you do, I can share my sample privately.

Otherwise, just try to be very specific about:
*) What you are trying to do
*) What you are actually doing to get there, and
*) What specifically you are getting stuck on (Exception? Missed
records? Out of memory? etc)

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Aug 23, 2012 at 2:40 PM, anarchos78
rigasathanasio...@hotmail.com wrote:
 Greeting friends,

 Straight to the point. I have stored many BLOBS in a Mysql DB. These are
 mainly PDF's(80%) and .doc. I have also text in the DB. Till now i have
 indexed and i can query the text, but i cannot index the BLOBS. I am trying
 to make a single collection(document)-but sucks. Is there any recipe on how
 to do such a thing?

 *A portion of data-config.xml:*

 ?xml version=1.0 encoding=utf-8?

 dataConfig

   dataSource type=JdbcDataSource
   autoCommit=true batchSize=-1
   convertType=false
   driver=com.mysql.jdbc.Driver
   url=jdbc:mysql://127.0.0.1:3306/ktimatologio
   user=root
   password=1a2b3c4d
   name=db/

  dataSource name=fieldReader type=FieldStreamDataSource 
 /


   document


   entity name=aitiologikes_ektheseis
 dataSource=db
 transformer=HTMLStripTransformer
 query=select id, title, title AS grid_title, model, type, url,
 last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, CONCAT(
 body,' ',title)  AS content from aitiologikes_ektheseis where type = 'text'
 deltaImportQuery=select id, title, title AS grid_title, model, type, 
 url,
 last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, CONCAT(
 body,' ',title)  AS content from aitiologikes_ektheseis where type = 'text'
 and id='${dataimporter.delta.id}'
 deltaQuery=select id, title, title AS grid_title, model, type, url,
 last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, CONCAT(
 body,' ',title)  AS content from aitiologikes_ektheseis where type = 'text'
 and last_modified  '${dataimporter.last_index_time}'
 field column=id name=ida /
 field column=solr_id name=solr_id /
 field column=title name=title stripHTML=true /
 field column=grid_title name=grid_title stripHTML=true 
 /
 field column=model name=model stripHTML=true /
 field column=type name=type stripHTML=true /
 field column=url name=url stripHTML=true /
 field column=last_modified name=last_modified 
 stripHTML=true  /
 field column=search_tag name=search_tag stripHTML=true 
 /
 field column=content name=content stripHTML=true /
 /entity

 entity name=aitiologikes_ektheseis_bin
   query=select id, title, title AS grid_title, model, type, url,
 last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, bin_con AS
 text from aitiologikes_ektheseis where type = 'bin'
   deltaImportQuery=select id, title, title AS grid_title, model, 
 type,
 url, last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, bin_con
 AS text from aitiologikes_ektheseis where type = 'bin' and
 id='${dataimporter.delta.id}'
   deltaQuery=select id, title, title AS grid_title, model, type, url,
 last_modified, CONCAT_WS('_',id,model) AS solr_id, search_tag, bin_con AS
 text from aitiologikes_ektheseis where type = 'bin' and last_modified 
 '${dataimporter.last_index_time}'
   transformer=TemplateTransformer
   dataSource=db

   field column=id name=ida /
 field column=solr_id name=solr_id /
   field column=title name=title stripHTML=true /
   field column=grid_title name=grid_title 
 stripHTML=true /
   field column=model name=model stripHTML=true /
   field column=type name=type stripHTML=true /
   field column=url name=url stripHTML=true /
   field column=last_modified name=last_modified 
 stripHTML=true  /
   field column=search_tag 

Re: Solr search – Tika extracted text from PDF not return highlighting snippet

2012-08-24 Thread Lance Norskog
There are two different sets of readers for binary and character-mode
data, and I don't remember which is which. You may be reading the PDF
binary blob as a character blob.

On Wed, Aug 22, 2012 at 1:34 AM, anarchos78
rigasathanasio...@hotmail.com wrote:
 Thanks for your reply,
 I had tryied many things (copy field etc) with no succes. Notice that the
 pdfs are stored as BLOB in mysql database. I am trying to use DIH in order
 to fetch the binaries from DB. Is it possible?
 Thanks!



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-search-Tika-extracted-text-from-PDF-not-return-highlighting-snippet-tp3999647p4002587.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com


Is SpellCheck Case Sensitive in Solr3.6.1?

2012-08-24 Thread mechravi25
Hi,

Im using solr 3.6.1 version now and I configured spellcheck by making
following changes

Solrconfig.xml:

searchComponent name=spellcheck class=solr.SpellCheckComponent
lst name=spellchecker   
str name=classnamesolr.IndexBasedSpellChecker/str
str name=spellcheckIndexDir./spellchekerIndex/str 
str name=fieldspell/str
str name=buildOnCommittrue/str
  /lst
/searchComponent

and added the following in the standard handler to include the spellcheck

arr name=last-components
strspellcheck/str
/arr

Schema.xml:  

fieldType name=spell class=solr.TextField positionIncrementGap=100
  analyzer type=index
  charFilter class=solr.HTMLStripCharFilterFactory/ 
   tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

field name=spell type=spell indexed=true stored=false
multiValued=true / 

and used the copy field to copy all the other field's value to spelling
field

When I try to search for list, it does not return any suggestions; but
when I try to search for List, it returns many suggestions (But in both
the cases, Im getting the same search result count and its not zero).
I also tried giving a different field name as spelling and tried to use
the same in solrconfig.xml. This is also behaving like above

Is spell check case sensitive? what I want to achieve is that I have to get
the same  suggestions when I enter both list and as well as List

Am I missing anything? Can some please guide me on this?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-SpellCheck-Case-Sensitive-in-Solr3-6-1-tp4003074.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query expansion by taxonomy

2012-08-24 Thread Nicholas Ding
Hello,

I want do query expansion on Solr, I have a taxonomy index like this.

field name=category type=string indexed=true stored=true /
field name=keyword type=string indexed=true stored=true /
field name=syonym type=string indexed=true multiValued=true /

Is that possible to do one search to get a list contains searched keywords
and their siblings under same category?

For example, search for Ford whose category is Car Dealer, the results
is not limited to Ford, but also Honda, BMW, Benz under same
category Car Dealer.

Thanks
Nicholas


ngroups question

2012-08-24 Thread reikje
I have a question regarding expected memory consumption when using field
collapsing with the ngroups parameter. We have indexed a forum with 500.000
threads. Each thread is a group, so we can have max. 500.000 groups. I read
somewhere that for each group a org.apache.lucene.util.ByteRef is created
which is added to a ArrayList. Whats the content of the byte[] the ByteRef
is created with? It will help me to estimate how much memory is used in
worst case if all groups are returned (which is unlikly).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ngroups-question-tp4003093.html
Sent from the Solr - User mailing list archive at Nabble.com.


Debugging DIH

2012-08-24 Thread Hasan Diwan
I have some data in an H2 database that I'd like to move to SOLR. I
probably should/could extract and post the contents as 1 new document per
record, but I'd like to configure the data import handler and am having
some difficulty doing so. Following the wiki instructions[1], I have the
following in my db-data-config.xml:
dataConfig
dataSource type=JdbcDataSource driver=org.h2.Driver
url=jdbc:h2:tcp://192.168.1.6/finance user=sa /
document
  entity name=receipt query=select location as location, amount as
amount, done_on as when from RECEIPTS as r join APP_USERS as a on r.user_id
= a.id/
/document
/dataConfig

I also have dropped the JDBC driver into db/lib, witness:
% jar tvf ./lib/h2-1.3.164.jar | grep 'Driver'
13 Fri Feb 03 12:02:56 PST 2012 META-INF/services/java.sql.Driver
  2508 Fri Feb 03 12:02:56 PST 2012 org/h2/Driver.class
   485 Fri Feb 03 12:02:56 PST 2012 org/h2/util/DbDriverActivator.class

and I've added the appropriate fields to schema.xml:
  field name=location type=string indexed=true stored=true/
   field name=amount type=currency indexed=true stored=true/
   field name=when type=date indexed=true stored=true/

There's nothing in my index and 343 rows in my table. What is going on? -- H
-- 
Sent from my mobile device
Envoyait de mon portable
1. http://wiki.apache.org/solr/DIHQuickStart


Re: Debugging DIH

2012-08-24 Thread Hasan Diwan
On 24 August 2012 07:17, Hasan Diwan hasan.di...@gmail.com wrote:

 I have some data in an H2 database that I'd like to move to SOLR. I
 probably should/could extract and post the contents as 1 new document per
 record, but I'd like to configure the data import handler and am having
 some difficulty doing so. Following the wiki instructions[1], I have the
 following in my db-data-config.xml:
 dataConfig
 dataSource type=JdbcDataSource driver=org.h2.Driver
 url=jdbc:h2:tcp://192.168.1.6/finance user=sa /
 document
   entity name=receipt query=select location as location, amount as
 amount, done_on as when from RECEIPTS as r join APP_USERS as a on r.user_id
 = a.id/
 /document
 /dataConfig

 I also have dropped the JDBC driver into db/lib, witness:
 % jar tvf ./lib/h2-1.3.164.jar | grep 'Driver'
 13 Fri Feb 03 12:02:56 PST 2012 META-INF/services/java.sql.Driver
   2508 Fri Feb 03 12:02:56 PST 2012 org/h2/Driver.class
485 Fri Feb 03 12:02:56 PST 2012 org/h2/util/DbDriverActivator.class

 and I've added the appropriate fields to schema.xml:
   field name=location type=string indexed=true stored=true/
field name=amount type=currency indexed=true stored=true/
field name=when type=date indexed=true stored=true/

 There's nothing in my index and 343 rows in my table. What is going on? --
 H


One more data point:
% curl -L http://localhost:8983/solr/db/dataimport?command=status;

 ?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime0/int/lstlst name=initArgslst name=defaultsstr
name=configdb-data-config.xml/str/lst/lststr
name=commandstatus/strstr name=statusidle/strstr
name=importResponse/lst name=statusMessagesstr name=Total Requests
made to DataSource1/strstr name=Total Rows Fetched343/strstr
name=Total Documents Skipped0/strstr name=Full Dump
Started2012-08-24 07:19:26/strstr name=Indexing completed.
Added/Updated: 0 documents. Deleted 0 documents./strstr
name=Committed2012-08-24 07:19:26/strstr name=Optimized2012-08-24
07:19:26/strstr name=Total Documents Processed0/strstr name=Time
taken 0:0:0.328/str/lststr name=WARNINGThis response format is
experimental.  It is likely to change in the future./str
/response



-- 
Sent from my mobile device
Envoyait de mon portable


Re: Debugging DIH

2012-08-24 Thread Andy Lester

On Aug 24, 2012, at 9:17 AM, Hasan Diwan wrote:

 dataConfig
dataSource type=JdbcDataSource driver=org.h2.Driver
 url=jdbc:h2:tcp://192.168.1.6/finance user=sa /
document
  entity name=receipt query=select location as location, amount as
 amount, done_on as when from RECEIPTS as r join APP_USERS as a on r.user_id
 = a.id/
/document
 /dataConfig
 
 and I've added the appropriate fields to schema.xml:
  field name=location type=string indexed=true stored=true/
   field name=amount type=currency indexed=true stored=true/
   field name=when type=date indexed=true stored=true/
 
 There's nothing in my index and 343 rows in my table. What is going on? -- H


I don't see that you have anything in the DIH that tells what columns from the 
query go into which fields in the index.  You need something like

field name=location column=location /
field name=amount column=amount /
field name=when column=when /

xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Re: Query expansion by taxonomy

2012-08-24 Thread Jack Krupansky

The More Like This feature may give you what you want:
http://wiki.apache.org/solr/MoreLikeThis
http://wiki.apache.org/solr/MoreLikeThisHandler

The basic idea is that you do your query on your primary field(s), then you 
take term(s) from some secondary field (your category) and re-query and add 
those results to the response.


You can use the component to integrate the secondary results in the 
primary response, or use the handler to do a separate query to get 
segregated results.


-- Jack Krupansky

-Original Message- 
From: Nicholas Ding

Sent: Friday, August 24, 2012 9:15 AM
To: solr-user@lucene.apache.org
Subject: Query expansion by taxonomy

Hello,

I want do query expansion on Solr, I have a taxonomy index like this.

field name=category type=string indexed=true stored=true /
field name=keyword type=string indexed=true stored=true /
field name=syonym type=string indexed=true multiValued=true /

Is that possible to do one search to get a list contains searched keywords
and their siblings under same category?

For example, search for Ford whose category is Car Dealer, the results
is not limited to Ford, but also Honda, BMW, Benz under same
category Car Dealer.

Thanks
Nicholas 



RES: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

2012-08-24 Thread Claudio Ranieri
Hi Vadim,
No, I used the entire apache-solr-4.0.0-BETA\example\solr (schema.xml, 
solrconfig.xml ...)


-Mensagem original-
De: Vadim Kisselmann [mailto:v.kisselm...@gmail.com] 
Enviada em: sexta-feira, 24 de agosto de 2012 07:26
Para: solr-user@lucene.apache.org
Assunto: Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

a presumption:
do you use your old solrconfig.xml files from older installations?
when yes, compare the default config and yours.


2012/8/23 Claudio Ranieri claudio.rani...@estadao.com:
 I made this instalation on a new tomcat.
 With Solr 3.4.*, 3.5.*, 3.6.* works with jars into 
 $TOMCAT_HOME/webapps/solr/WEB-INF/lib, but with solr 4.0 beta doesn´t work. I 
 needed to add the jars into $TOMCAT_HOME/lib.
 The problem with the cast seems to be in the source code.


 -Mensagem original-
 De: Karthick Duraisamy Soundararaj 
 [mailto:karthick.soundara...@gmail.com]
 Enviada em: quinta-feira, 23 de agosto de 2012 09:22
 Para: solr-user@lucene.apache.org
 Assunto: Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

 Not sure if this can help. But once I had a similar problem with Solr 3.6.0 
 where tomcat refused to find one of the classes that existed. I deleted the 
 tomcat's webapp directory and then it worked fine.

 On Thu, Aug 23, 2012 at 8:19 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 First, I'm no Tomcat expert here's the Tomcat Solr page, but 
 you've probably already seen it:
 http://wiki.apache.org/solr/SolrTomcat

 But I'm guessing that you may have old jars around somewhere and 
 things are getting confused. I'd blow away the whole thing and start 
 over, whenever I start copying jars around I always lose track of 
 what's where.

 Have you successfully had any other Solr operate under Tomcat?

 Sorry I can't be more help
 Erick

 On Wed, Aug 22, 2012 at 9:47 AM, Claudio Ranieri 
 claudio.rani...@estadao.com wrote:
  Hi,
 
  I tried to start the solr-4.0.0-BETA with tomcat-6.0.20 but does 
  not
 work.
  I copied the apache-solr-4.0.0-BETA.war to $TOMCAT_HOME/webapps.
  Then I
 copied the directory apache-solr-4.0.0-BETA\example\solr to 
 C:\home\solr-4.0-beta and adjusted the file 
 $TOMCAT_HOME\conf\Catalina\localhost\apache-solr-4.0.0-BETA.xml to 
 point the solr/home to C:/home/solr-4.0-beta. With this 
 configuration, when I startup tomcat I got:
 
  SEVERE: org.apache.solr.common.SolrException: Invalid 
  luceneMatchVersion
 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22, 
 LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_31, LUCENE_32, 
 LUCENE_33, LUCENE_34, LUCENE_35, LUCENE_36, LUCENE_CURRENT ] or a string in 
 format 'VV'
 
  So I changed the line in solrconfig.xml:
 
  luceneMatchVersionLUCENE_40/luceneMatchVersion
 
  to
 
  luceneMatchVersionLUCENE_CURRENT/luceneMatchVersion
 
  So I got a new error:
 
  Caused by: java.lang.ClassNotFoundException:
 solr.NRTCachingDirectoryFactory
 
  This class is within the file apache-solr-core-4.0.0-BETA.jar but 
  for
 some reason classloader of the class is not loaded. I then moved all 
 jars in $TOMCAT_HOME\webapps\apache-solr-4.0.0-BETA\WEB-INF\lib to 
 $TOMCAT_HOME\lib.
  After this setup, I got a new error:
 
  SEVERE: java.lang.ClassCastException:
 org.apache.solr.core.NRTCachingDirectoryFactory can not be cast to 
 org.apache.solr.core.DirectoryFactory
 
  So I changed the line in solrconfig.xml:
 
  directoryFactory name=DirectoryFactory
 
 class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/
 
  to
 
  directoryFactory name=DirectoryFactory
 
 class=${solr.directoryFactory:solr.NIOFSDirectoryFactory}/
 
  So I got a new error:
 
  Caused by: java.lang.ClassCastException:
 org.apache.solr.spelling.DirectSolrSpellChecker can not be cast to 
 org.apache.solr.spelling.SolrSpellChecker
 
  How can I resolve the problem of classloader?
  How can I resolve the problem of cast of NRTCachingDirectoryFactory 
  and
 DirectSolrSpellChecker?
  I can not startup the solr 4.0 beta with tomcat.
  Thanks,
 
 
 
 




 --
 --
 Karthick D S
 Master's in Computer Engineering ( Software Track ) Syracuse 
 University Syracuse - 13210 New York United States of America


Re: Solr - Unique Key Field Should Apply on q search or fq search

2012-08-24 Thread Jack Krupansky
A query such as q=myTextFeild:politics programme will search for 
programme in the default search field, which may not have any hits. An 
explicit field name applies to only the immediately successive term or 
parenthesized sub-query.


The second and third queries work because the default operator is OR, so 
it doesn't matter that programme can't be found.


Maybe you meant q=myTextFeild:(politics programme)

Or, actually, q=myTextFeild:(politics AND programme) or 
q=myTextFeild:(+politics +programme)


-- Jack Krupansky

-Original Message- 
From: meghana

Sent: Friday, August 24, 2012 7:54 AM
To: solr-user@lucene.apache.org
Subject: Solr - Unique Key Field Should Apply on q search or fq search

I am right now applying unique key field search on q search. but sometimes 
it

occurs issue with text search ,

For. e. g. if i search with below url, then it return results me as 0 rows ,
where as such record exist.

http://localhost:8080/solr/core0/select?q=myTextFeild:politics programme AND
myuniquekey:193834

but if i modify my search with any of below mentioned search query it works
properly.

http://localhost:8080/solr/core0/select?q=myuniquekey:193834 AND
myTextFeild:politics programme

OR

http://localhost:8080/solr/core0/select?q=myTextFeild:politics
programmefq=myuniquekey:193834

Now i don't know which would be better option to apply , should i apply
unique key on query or in filter query

Please Suggest.
Thanks






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Unique-Key-Field-Should-Apply-on-q-search-or-fq-search-tp4003066.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Unmatched quotes

2012-08-24 Thread Jack Krupansky

1. You can look at the Solr log file and see this exception:

Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 
'title:cycle with 24': Lexical error at line 1, column 21.  Encountered: 
EOF after : \cycle with 24
   at 
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:216)
   at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79)

   at org.apache.solr.search.QParser.getQuery(QParser.java:143)
   at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105)

   ... 22 more
Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at 
line 1, column 21.  Encountered: EOF after : \cycle with 24


2. Upgrade to Solr 4.0-BETA and Solr will give you this response:

response
lst name=responseHeader.../lst
lst name=error
str name=msg
org.apache.lucene.queryparser.classic.ParseException: Cannot parse 
'title:cycle with 24': Lexical error at line 1, column 21. Encountered: 
EOF after : \cycle with 24

/str
int name=code400/int
/lst
/response

-- Jack Krupansky

-Original Message- 
From: Peter Kirk

Sent: Friday, August 24, 2012 6:50 AM
To: solr-user@lucene.apache.org
Subject: Unmatched quotes

Hi,

If I execute the following query, with unmatched quotes, I get an error from 
Solr - as I haven't escaped the middle .


But the error message appears to simply be 400 null. Is it possible to get 
Solr to return a more informative error message?


http://myhost/solr/myapp/select?q=title:cycle with 24 wheels

Thanks,
Peter 



Re: Solr Index problem

2012-08-24 Thread Chantal Ackermann

 Are you committing? You have to commit for them to be actually added….

If DIH says it did not add any documents (added 0 documents) committing won't 
help.

Likely, there is a problem with the mapping between DIH and the schema so that 
none of the fields make it into the index. We would need the DIH and the schema 
file, as Andy pointed out already.

Cheers,
Chantal



 

 -Original Message-
 From: ranmatrix S [mailto:ranmat...@gmail.com] 
 Sent: Thursday, August 23, 2012 5:46 PM
 To: solr-user@lucene.apache.org
 Subject: Solr Index problem
 
 Hi,
 
 I have setup Solr to index data from Oracle DB through DIH handler. However 
 through Solr admin I could see the DB connection is successfull, data 
 retrieved from DB to Solr but not added into index. The message is that 0 
 documents added even when I am able to see that 9 records are returned back.
 
 The schema and fields in db-data-config.xml are one and the same.
 
 Please suggest if anything I should look for.
 
 --
 Regards,
 Ran...



Re: Debugging DIH

2012-08-24 Thread Chantal Ackermann
 
 I don't see that you have anything in the DIH that tells what columns from 
 the query go into which fields in the index.  You need something like
 
 field name=location column=location /
 field name=amount column=amount /
 field name=when column=when /
 

That is not completely true. If the columns have the same names as the fields, 
the mapping is redundant. Nevertheless, it might be the problem. What I've 
experienced with Oracle, at least, is that the columns would be returned in 
uppercase even if my alias would be in lowercase. You might force it by adding 
quotes, though. Or try adding

field name=location column=LOCATION /
field name=amount column=AMOUNT /
field name=when column=WHEN /

You might check in your preferred SQL client how the column names are returned. 
It might be an indicator. (At least, in my case they would be uppercase in SQL 
Developer.)

Cheers,
Chantal

Re: Solr Index problem

2012-08-24 Thread Michael Della Bitta
Have you investigated the logs of your servlet container? There's
probably some explanation for why the documents weren't submitted in
there.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Thu, Aug 23, 2012 at 5:46 PM, ranmatrix S ranmat...@gmail.com wrote:
 Hi,

 I have setup Solr to index data from Oracle DB through DIH handler. However
 through Solr admin I could see the DB connection is successfull, data
 retrieved from DB to Solr but not added into index. The message is that 0
 documents added even when I am able to see that 9 records are returned
 back.

 The schema and fields in db-data-config.xml are one and the same.

 Please suggest if anything I should look for.

 --
 Regards,
 Ran...


Re: Debugging DIH

2012-08-24 Thread Ahmet Arslan

 That is not completely true. If the columns have the same
 names as the fields, the mapping is redundant. Nevertheless,
 it might be the problem. What I've experienced with Oracle,
 at least, is that the columns would be returned in uppercase
 even if my alias would be in lowercase. You might force it
 by adding quotes, though. Or try adding
 
 field name=location column=LOCATION /
 field name=amount column=AMOUNT /
 field name=when column=WHEN /
 
 You might check in your preferred SQL client how the column
 names are returned. It might be an indicator. (At least, in
 my case they would be uppercase in SQL Developer.)

There is a jsp page for debugging DIH

http://localhost:8080/solr/admin/dataimport.jsp?handler=/dataimport


Re: Query-side Join work in distributed Solr?

2012-08-24 Thread Erick Erickson
Right, there hasn't been any action on that patch in a while...

Best
Erick

On Wed, Aug 22, 2012 at 12:18 PM, Timothy Potter thelabd...@gmail.com wrote:
 Just to clarify that query-side joins ( e.g. {!join from=id
 to=parent_signal_id_s}id:foo ) do not work in a distributed mode yet?
 I saw LUCENE-3759 as unresolved but also some some Twitter traffic
 saying there was a patch available.

 Cheers,
 Tim


Re: Index version generation for Solr 3.5

2012-08-24 Thread Erick Erickson
This is quite possible if you have multiple commits
between replications. You should _not_ depend on
the version number of an index changing in a pre-defined
way, it'll increase on a commit, but that's about all you
can really count on...

The slaves do not increment the index version, they just
get it from the master...

Best
Erick

On Wed, Aug 22, 2012 at 12:41 PM, Xin Li xin.li@gmail.com wrote:
 Hi,

 I ran into an issue lately with Index version  generation for Solr 3.5.

 In Solr 1.4., the index version of slave service increments upon each
 replication. However, I noticed it's not the case for Solr 3.5; the
 index version would increase 20, or 30 after replication. Does anyone
 know why and any reference on the web for this?
 The index generation does still increment after replication though.

 Thanks,

 Xin


Re: Group count in SOLR 3.3

2012-08-24 Thread Erick Erickson
3.6 has a getNGroups, does that do what you want?

Best
Erick

On Thu, Aug 23, 2012 at 2:23 AM, Roman Slavík sla...@effectiva.cz wrote:
 Hi guys,

 we are using SOLR 3.3 with Solrj inside our java project. In actual version
 we had to add some grouping support, so we add parameters into SolrQuery
 object like this:

 query.setParam(GroupParams.GROUP, true);
 query.setParam(GroupParams.GROUP_MAIN, true);
 query.setParam(GroupParams.GROUP_FIELD, OUR_GROUP_FIELD);

 and we get QueryResponse with results we need. Awesome!

 But now I have one remaining problem, I don't know how get number of groups
 from QueryResponse. I found I must add group.ngroups=true param into query.
 So I did it:

query.setParam(GroupParams.GROUP_TOTAL_COUNT, true);

 But QueryResponse seems same. There's no method like getGroupCount() or
 group count param in header.

 Am I doing something wrong? Or is it SOLR 3.3 problem? If we upgrade to
 newer version, will it works?

 Thanks for any advice!

 Roman


Re: Can't extract Outlook message files

2012-08-24 Thread Erick Erickson
Hmmm, it kind of looks like your file doesn't have an id field, but
that's just guessing based on your statement hat providing an ID
works just fine. Does it work if you take the uniqueKey definition
out of your schema.xml (and you'll also
have to remove the 'required=true ' from the id field)?

But this is a wild shot in the dark

Best
Erick

On Thu, Aug 23, 2012 at 5:27 AM, Alexander Cougarman acoug...@bwc.org wrote:
 Hi. We're trying to use the following Curl command to perform an extract 
 only of *.MSG file, but it blows up:

curl http://localhost:8983/solr/update/extract?extractOnly=true; -F 
 myfile=@92.msg

 If we do this, it works fine:

   curl 
 http://localhost:8983/solr/update/extract?literal.id=doc1commit=true; -F 
 myfile=@92.msg

 We've tried a variety of MSG files and they all produce the same error; they 
 all have content in them. What are we doing wrong?

 Here's the exception the extractOnly=true command generates:

 html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 500 null

 org.apache.solr.common.SolrException
 at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
 actingDocumentLoader.java:233)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
 ntentStreamHandlerBase.java:58)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
 erBase.java:129)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
 Request(RequestHandlers.java:244)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
 .java:365)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
 r.java:260)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
 Handler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
 99)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
 a:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
 82)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
 66)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
 lerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
 java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
 52)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
 2)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
 n.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
 java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
 ava:582)
 Caused by: org.apache.tika.exception.TikaException: Unexpected 
 RuntimeException
 from org.apache.tika.parser.microsoft.OfficeParser@aaf063
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
 )
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
 )
 at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
 20)
 at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
 actingDocumentLoader.java:227)
 ... 23 more
 Caused by: java.lang.IllegalStateException: Internal: Internal error: element 
 st
 ate is zero.
 at 
 org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
 wn Source)
 at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
 at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
 at 
 org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
 Decorator.java:136)
 at 
 org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
 ler.java:256)
 at 
 org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
 Decorator.java:136)
 at 
 org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
 Decorator.java:136)
 at 
 org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
 Decorator.java:136)
 at 
 org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
 java:273)
 at 
 org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
 er.java:213)
 at 
 

Re: Boosting documents matching in a specific shard

2012-08-24 Thread Erick Erickson
Well, the simplest would be to include the shard ID in the document
when you index it, then just boost on that field...

Best
Erick

On Thu, Aug 23, 2012 at 8:33 AM, Husain, Yavar yhus...@firstam.com wrote:
 I am aware that IDF is not distributed. Suppose I have to boost or give 
 higher rank to documents which are matching in a specific/particular shard, 
 how can I accomplish that?
 **
 This message may contain confidential or proprietary information intended 
 only for the use of the
 addressee(s) named above or may contain information that is legally 
 privileged. If you are
 not the intended addressee, or the person responsible for delivering it to 
 the intended addressee,
 you are hereby notified that reading, disseminating, distributing or copying 
 this message is strictly
 prohibited. If you have received this message by mistake, please immediately 
 notify us by
 replying to the message and delete the original message and any copies 
 immediately thereafter.

 Thank you.-
 **
 FAFLD



Re: Query regarding multi core search

2012-08-24 Thread Erick Erickson
Why do you have 4 cores in the first place? The usual use-case
is that cores aren't for similar documents

But the easiest thing to do would be to include the sort field
in the response and have the app (or whatever is aggregating
the 4 responses) sort the responses, essentially merging the
4 sets of documents. Paging will be a problem here though...

If the 4 cores are identical, why not
1 put them all in the same core
or
2 make them shards?

Best
Erick

On Thu, Aug 23, 2012 at 8:49 AM, ravicv ravichandra...@gmail.com wrote:
 Hi,

 How sorting is done in SOLR with multiple cores .. say 20 cores.. because in
 multi core search it should search in all cores then on complete results it
 should sort... please correct me if i am wrong.

 In our scenario we are executing same query on 4 cores and finally sorting
 the results based on one field. It works good. But i want to implement
 similar thing with in SOLR .  can any one suggest me some code or blog
 regarding this?

 I have tried some approaches , but it takes more memory :(

 Thanks,
 Ravi



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Query-regarding-multi-core-search-tp4002847.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Porting Lucene Index to Solr: ERROR:SCHEMA-INDEX-MISMATCH

2012-08-24 Thread Erick Erickson
Trie fields index extra information to aid in ranges etc. So if
you indexed your data as non-trie, then asked Solr to read them
as trie fields, it's bound to be unfortunate. Or if you changed the
precisionstep. Or.

Your schema has to exactly reflect what your lucene program
did for indexing, and my guess is it doesn't...

Best
Erick

On Thu, Aug 23, 2012 at 10:55 AM, Petra Lehrmann
petra.lehrm...@tu-dortmund.de wrote:
 Hello all!

 I already posted this question to Stackoverflow
 (http://stackoverflow.com/questions/12027451/solr-net-query-argumentexception-in-windows-forms)
 but as is, either my question is too specific or just too trivial. I don't
 know. But maybe I'm just trying to put the cart before the horse.

 I have a C# Windows Forms application up and running. It uses the Lucene.Net
 Library with which I created a Lucen index (off of a Postgres-database).
 There are some articles which have more than one value, so I decided to take
 numeric fields into account and used them in my application as:

 |  var  valueField=  new  NumericField(internalname,  Field.Store.YES,
 true);
  valueField.SetDoubleValue(value);
  doc.Add(valueField);

 |

 |I can open my Lucene index in Luke and see all those nice fields I made, so
 there should be no problem with the index; plus: my application searches and
 displays the result sets of the lucene index quite fine.|
 ||
 So I thought about trying Solr and read that I could use the Lucene index at
 hand - I just had to edit the schema.xml file from Solr, which I did. For
 the numeric field variables of the Lucene index I read somewhere that I have
 to use TrieFields, so I updated my schema.xml as follows:

 |fieldType name=tdouble  class=solr.TrieDoubleField  precisionStep=0
 omitNorms=true  positionIncrementGap=0/
 [...]
 field name=LuminaireId  type=string  indexed=true  stored=true/
 field name=LampCount  type=tdouble  multiValued=true  indexed=true
 stored=true  required=false/
 [...]

 |||

 For those fields, which use the numeric field I had the TrieDoubleField of
 Solr in mind and changed them. Firing up Solr on Tomcat and hitting it with
 a search query as LampCount:1 returned all the right documents. But the
 xml-output always says:

 |arr name=LampCount
 strERROR:SCHEMA-INDEX-MISMATCH,stringValue=1/str
 /arr
 |


 This could be the problem why my C# application is not running properly
 (using the solrnet library as the brigde between Solr instance and
 application) and always throws an ArgumentException when hitting my solrnet
 implementation with:

 |  var  results=  solr.Query(LampCount:1);
 |

 But first things first: I'm not sure why there is this Index-Mismatch and
 how to solve it -maybe I just didn't understand the explanation of
 TrieFields or the port from NumericFields?

 Any help would be greatly appreciated. :)

 Greetings from Germany,

 Petra

 


 |

 |



Re: Bitmap field in solr

2012-08-24 Thread Erick Erickson
There are a couple of open JIRAs, but native bitwise support isn't in
the code yet. See
SOLR-1913 and SOLR-1918

Best
Erick

On Thu, Aug 23, 2012 at 4:31 PM, Andy Lester a...@petdance.com wrote:

 On Aug 23, 2012, at 2:54 PM, Rohit Harchandani wrote:

 Hi all,
 Is there any way to have a bitmap field in Solr??
 I have a use case where I need to search specific attributes of a document.
 Rather than having an is_A, is_B, is_C (all related to each other)etc...how
 would i store all this data in a single field and still be able to query
 it?? Can it be done in any way apart from storing them as strings in a text
 field?


 You can have a field that is multiValued.  It still needs a base type, like 
 string or int.  For instance, in my book database, I have a field called 
 classifications and it is multivalued.

 field name=classifications  type=string 
 multiValued=true /

 A classification of 1 means spiralbound, and 2 means large print and 3 
 means multilingual and so on.  So if my user wants to search for a 
 multilingual book, I search for classifications:3.  If you want spiralbound 
 large print, you'd search for classifications:1 classifications:2.

 xoa

 --
 Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



turning up logging using the web UI, can't get more than INFO

2012-08-24 Thread Kevin Goess
We have a pretty standard out-of-the-box solr/jetty setup.  Using the web
UI at /solr/admin/logging, for WARNING or SEVERE we get less logging, but
none of CONFIG, FINE or FINEST result in any *more* logging than just at
INFO.

Is there another place to look for something that might be controlling
that?  Maybe a threshold=INFO somewhere?  We've been unable to find
anything.

We're trying to turn up logging because our solr indexing server is hanging
at least once a day and we're trying to find out why.  It becomes
unresponsive and we have to kill -9 it.


What are the available parameters in field tag in schema.xml, and data-config.xml ?

2012-08-24 Thread srinalluri
I want to know XSD of schema.xml and data-config.xml.

Basically I want to know available parameters in field tag in schema.xml,
and data-config.xml.

For example, in schema.xml, field tag is having a parameter called
'default', what else are available?
For example, in data-config.xml, field tag is having a parameter called
'splitBy', what else are available?

thanks
Srini 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-are-the-available-parameters-in-field-tag-in-schema-xml-and-data-config-xml-tp4003142.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What are the available parameters in field tag in schema.xml, and data-config.xml ?

2012-08-24 Thread Ahmet Arslan
 For example, in schema.xml, field tag is having a parameter
 called
 'default', what else are available?

Here is the full list:
http://wiki.apache.org/solr/SchemaXml#Common_field_options

 For example, in data-config.xml, field tag is having a
 parameter called
 'splitBy', what else are available?

splitBy is a parameter of RegexTransformer. 
http://wiki.apache.org/solr/DataImportHandler#Transformer


Re: turning up logging using the web UI, can't get more than INFO

2012-08-24 Thread Ahmet Arslan
 We have a pretty standard
 out-of-the-box solr/jetty setup.  Using the web
 UI at /solr/admin/logging, for WARNING or SEVERE we get less
 logging, but
 none of CONFIG, FINE or FINEST result in any *more* logging
 than just at
 INFO.
 
 Is there another place to look for something that might be
 controlling
 that?  Maybe a threshold=INFO somewhere?  We've
 been unable to find
 anything.
 
 We're trying to turn up logging because our solr indexing
 server is hanging
 at least once a day and we're trying to find out why. 
 It becomes
 unresponsive and we have to kill -9 it.
Taken from solr-trunk/solr/example/README.txt

By default, Solr will log to the console. This can be convenient when first 
getting started, but eventually you will want to log to a file. To enable 
logging, you can just pass a system property to Jetty on startup:

java -Djava.util.logging.config.file=etc/logging.properties -jar start.jar
 
This will use Java Util Logging to log to a file based on the config in
etc/logging.properties. Logs will be written in the logs directory. It is
also possible to setup log4j or other popular logging frameworks.


Re: Is SpellCheck Case Sensitive in Solr3.6.1?

2012-08-24 Thread Kiran Jayakumar
You are missing query analyzer field type: add this line in your search
component.

searchComponent name=spellcheck class=solr.SpellCheckComponent
*str name=queryAnalyzerFieldTypespell/str*
lst name=spellchecker
...


On Fri, Aug 24, 2012 at 5:31 AM, mechravi25 mechrav...@yahoo.co.in wrote:

 Hi,

 Im using solr 3.6.1 version now and I configured spellcheck by making
 following changes

 Solrconfig.xml:

 searchComponent name=spellcheck class=solr.SpellCheckComponent
 lst name=spellchecker
 str name=classnamesolr.IndexBasedSpellChecker/str
 str name=spellcheckIndexDir./spellchekerIndex/str
 str name=fieldspell/str
 str name=buildOnCommittrue/str
   /lst
 /searchComponent

 and added the following in the standard handler to include the spellcheck

 arr name=last-components
 strspellcheck/str
 /arr

 Schema.xml:

 fieldType name=spell class=solr.TextField positionIncrementGap=100
   analyzer type=index
   charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.StandardFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.StandardFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 field name=spell type=spell indexed=true stored=false
 multiValued=true /

 and used the copy field to copy all the other field's value to spelling
 field

 When I try to search for list, it does not return any suggestions; but
 when I try to search for List, it returns many suggestions (But in both
 the cases, Im getting the same search result count and its not zero).
 I also tried giving a different field name as spelling and tried to use
 the same in solrconfig.xml. This is also behaving like above

 Is spell check case sensitive? what I want to achieve is that I have to get
 the same  suggestions when I enter both list and as well as List

 Am I missing anything? Can some please guide me on this?

 Thanks



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Is-SpellCheck-Case-Sensitive-in-Solr3-6-1-tp4003074.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

2012-08-24 Thread Fuad Efendi
Hi there,

Load term Info shows 3650 for a specific term MyTerm, and when I execute
query channel:MyTerm it shows 650 documents foundŠ possibly bugŠ it
happens after I commit data too, nothing changes; and this field is
single-valued non-tokenized string.

-Fuad

-- 
Fuad Efendi
416-993-2060
http://www.tokenizer.ca





Re: turning up logging using the web UI, can't get more than INFO

2012-08-24 Thread Kevin Goess
On Fri, Aug 24, 2012 at 10:23 AM, Ahmet Arslan iori...@yahoo.com wrote:

  We have a pretty standard
  out-of-the-box solr/jetty setup.  Using the web
  UI at /solr/admin/logging, for WARNING or SEVERE we get less
  logging, but
  none of CONFIG, FINE or FINEST result in any *more* logging
  than just at
  INFO.
 Taken from solr-trunk/solr/example/README.txt

 By default, Solr will log to the console. This can be convenient when
 first getting started, but eventually you will want to log to a file. To
 enable logging, you can just pass a system property to Jetty on startup:

 java -Djava.util.logging.config.file=etc/logging.properties -jar start.jar


Thanks, Ahmet. We're actually fine capturing console logging to a file
right now.  But are you saying that the FINEST/FINE/CONFIG levels on the
web ui at /solr/admin/logging/ won't work without a logging.config.file set
up ?





-- 
Kevin M. Goess
Software Engineer
Berkeley Electronic Press
kgo...@bepress.com

510-665-1200 x179
www.bepress.com

bepress: sustainable scholarly publishing


Solr 4.0 beta deadlock / file descriptor spike

2012-08-24 Thread Casey Callendrello
Hi there,
I have been doing some load testing with Solr 4 beta (now, trunk). My
configuration is fairly simple - two servers, replicating via SolrCloud.
SolrCloud is configured as recommended in the wiki:

updateRequestProcessorChain name=standard
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.DistributedUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain

Twice now I've seen sudden thread and file-descriptor spikes along with
a complete deadlock, simultaneously on both machines. My max FDs is set
to 1024, and (excepting the spikes) I never see usage over 375 fds.

The first FD spike was with an older trunk revision. It was co-incident
with a corrupt transaction log. I've lost the logs, unfortunately, but
SOLR tried to re-process the same log over and over, leaking FDs and dying.

The upgraded version has not reported the corrupt transaction issue
prior to deadlock. However, according to the log files, the deadlock
persists for about 5 minutes prior to FD exhaustion. The last log line
is simply INFO: end_commit_flush

Upon restart, I see a frightening amount of corrupt transaction log
exceptions and  New transaction log already exists exceptions.

Any thoughts?
Contact me for the thread dump; it's 1 MiB.

Thanks,
--Casey C.


signature.asc
Description: OpenPGP digital signature


Re: Query-side Join work in distributed Solr?

2012-08-24 Thread Pavel Goncharik
Do I understand correctly that once
https://issues.apache.org/jira/browse/SOLR-2592 is resolved, it will
make both distributed joins and field collapsing work?

Best regards, Pavel

On Fri, Aug 24, 2012 at 6:01 PM, Erick Erickson erickerick...@gmail.com wrote:
 Right, there hasn't been any action on that patch in a while...

 Best
 Erick

 On Wed, Aug 22, 2012 at 12:18 PM, Timothy Potter thelabd...@gmail.com wrote:
 Just to clarify that query-side joins ( e.g. {!join from=id
 to=parent_signal_id_s}id:foo ) do not work in a distributed mode yet?
 I saw LUCENE-3759 as unresolved but also some some Twitter traffic
 saying there was a patch available.

 Cheers,
 Tim


Re: Query-side Join work in distributed Solr?

2012-08-24 Thread Erick Erickson
Not as I understand it. All that allows is a pluggable assignment of
documents to shards in SolrCloud. There's nothing tying that JIRA to
distributed joins or field collapsing.

Distributed grouping is already in place as of Solr 3.5, see:
https://issues.apache.org/jira/browse/SOLR-2066

Best
Erick

On Fri, Aug 24, 2012 at 2:49 PM, Pavel Goncharik
pavel.goncha...@gmail.com wrote:
 Do I understand correctly that once
 https://issues.apache.org/jira/browse/SOLR-2592 is resolved, it will
 make both distributed joins and field collapsing work?

 Best regards, Pavel

 On Fri, Aug 24, 2012 at 6:01 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 Right, there hasn't been any action on that patch in a while...

 Best
 Erick

 On Wed, Aug 22, 2012 at 12:18 PM, Timothy Potter thelabd...@gmail.com 
 wrote:
 Just to clarify that query-side joins ( e.g. {!join from=id
 to=parent_signal_id_s}id:foo ) do not work in a distributed mode yet?
 I saw LUCENE-3759 as unresolved but also some some Twitter traffic
 saying there was a patch available.

 Cheers,
 Tim


Re: ngroups question

2012-08-24 Thread Erick Erickson
I think the memory size is about the (number of groups) * ((size of
key) + (a little memory for the bucket to hold members of that group).
This latter is (I'm guessing here) quite small.

Sure, you can have all 500.000 groups consume memory, quite easily.
q=*:* (OK, that one wouldn't be scored, but you get the idea). Whether
they're returned or not is not germane, they all have to be counted
(Martjin may jump all over _that_. Consider some group X with a
low-scoring document in it. When could you _know_ that you don't need
to return that group? Unfortunately, not until the very last document
is scored since it could be a perfect match for the query.

Best
Erick

On Fri, Aug 24, 2012 at 10:11 AM, reikje reik.sch...@gmail.com wrote:
 I have a question regarding expected memory consumption when using field
 collapsing with the ngroups parameter. We have indexed a forum with 500.000
 threads. Each thread is a group, so we can have max. 500.000 groups. I read
 somewhere that for each group a org.apache.lucene.util.ByteRef is created
 which is added to a ArrayList. Whats the content of the byte[] the ByteRef
 is created with? It will help me to estimate how much memory is used in
 worst case if all groups are returned (which is unlikly).



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/ngroups-question-tp4003093.html
 Sent from the Solr - User mailing list archive at Nabble.com.


How do I represent a group of customer key/value pairs

2012-08-24 Thread Sheldon P
I've just started to learn Solr and I have a question about modeling data
in the schema.xml.

I'm using SolrJ to interact with my Solr server.  It's easy for me to store
key/value paris where the key is known.  For example, if I have:

title=Some book title
author=The authors name


I can represent that data in the schema.xml file like this:

field name=title type=text_general indexed=true
stored=true/
field name=author type=text_general indexed=true
stored=true/

I also have data that is stored as a Java HashMap, where the keys are
unknown:

MapString, String map = new HashMapString, String();
map.put(some unknown key, some unknown data);
map.put(another unknown key, more unknown data);


I would prefer to store that data in Solr without losing its hierarchy.
 For example:

field name=map type=maptype indexed=true stored=true/

field name=some unknown key type=text_general indexed=true
stored=true/

field name=another unknown key type=text_general indexed=true
stored=true/

/field


Then I could search for some unknown key, and receive some unknown data.

Is this possible in Solr?  What is the best way to store this kind of data?


More debugging DIH - URLDataSource

2012-08-24 Thread Carrie Coy
I'm trying to write a DIH to incorporate page view metrics from an XML 
feed into our index.   The DIH makes a single request, and updates 0 
documents.  I set log level to finest for the entire dataimport 
section, but I still can't tell what's wrong.  I suspect the XPath.   
http://localhost:8080/solr/core1/admin/dataimport.jsp?handler=/dataimport returns 
404.  Any suggestions on how I can debug this?


   *

 solr-spec
 4.0.0.2012.08.06.22.50.47


The XML data:

?xml version='1.0' encoding='UTF-8'?
ReportDataResponse
Data
Rows
Row rowKey=P#PRODUCT: BURLAP POTATO SACKS  (PACK OF 12) 
(W4537)#N/A#5516196614 rowActionAvailability=0 0 0
Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: BURLAP 
POTATO SACKS  (PACK OF 12) (W4537)/Value

Value columnId=PAGE_VIEWS comparisonSpecifier=A2388/Value
/Row
Row rowKey=P#PRODUCT: OPAQUE PONY BEADS 6X9MM  (BAG OF 850) 
(BE9000)#N/A#5521976460 rowActionAvailability=0 0 0
Value columnId=PAGE_NAME comparisonSpecifier=APRODUCT: OPAQUE PONY 
BEADS 6X9MM  (BAG OF 850) (BE9000)/Value

Value columnId=PAGE_VIEWS comparisonSpecifier=A1313/Value
/Row
/Rows
/Data
/ReportDataResponse

My DIH:

|dataConfig
 dataSource name=coremetrics
 type=URLDataSource
 encoding=UTF-8
 connectionTimeout=5000
 readTimeout=1/

 document
entity  name=coremetrics
dataSource=coremetrics
pk=id

url=https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=**amp;username=amp;format=XMLamp;userAuthKey=amp;language=en_USmp;viewID=9475540amp;period_a=M20110930;
processor=XPathEntityProcessor
stream=true
forEach=/ReportDataResponse/Data/Rows/Row
logLevel=fine
transformer=RegexTransformer  

field  column=part_code  name=id
xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME']  regex=/^PRODUCT:.*\((.*?)\)$/  
replaceWith=$1/
field  column=page_views 
xpath=/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_VIEWS']  /
   /entity
 /document
/dataConfig
|

|||This little test perl script correctly extracts the data:|
||
|use XML::XPath;|
|use XML::XPath::XMLParser;|
||
|my $xp = XML::XPath-new(filename = 'cm.xml');|
|||my $nodeset = $xp-find('/ReportDataResponse/Data/Rows/Row');|
|||foreach my $node ($nodeset-get_nodelist) {|
|||my $page_name = $node-findvalue('Value[@columnId=PAGE_NAME]');|
|my $page_views = $node-findvalue('Value[@columnId=PAGE_VIEWS]');|
|$page_name =~ s/^PRODUCT:.*\((.*?)\)$/$1/;|
|}|

From logs:

INFO: Loading DIH Configuration: data-config.xml
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter 
loadDataConfig

INFO: Data Configuration loaded successfully
Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import} 
status=0 QTime=2
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter 
doFullImport

INFO: Starting Full Import
Aug 24, 2012 3:53:10 PM 
org.apache.solr.handler.dataimport.SimplePropertiesWriter 
readIndexerProperties

INFO: Read dataimport.properties
Aug 24, 2012 3:53:10 PM org.apache.solr.update.DirectUpdateHandler2 
deleteAll

INFO: [ssww] REMOVING ALL DOCUMENTS FROM INDEX
Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.URLDataSource 
getData
FINE: Accessing URL: 
https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=*username=***format=XMLuserAuthKey=**language=en_USviewID=9475540period_a=M20110930

Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:12 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=1

Aug 24, 2012 3:53:14 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=1

Aug 24, 2012 3:53:16 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:18 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:20 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:22 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:24 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport params={command=status} 
status=0 QTime=0

Aug 24, 2012 3:53:27 PM org.apache.solr.core.SolrCore execute
INFO: [ssww] webapp=/solr path=/dataimport 

Re: How do I represent a group of customer key/value pairs

2012-08-24 Thread Jack Krupansky

The general rule in Solr is simple: denormalize your data.

If you have some maps (or tables) and a set of keys (columns) for each map 
(table), define fields with names like map-name_key-name, such as 
map1_name, map2_name, map1_field1, map2_field1. Solr has dynamic 
fields, so you can define map-name_* to have a desired type - if all the 
keys have the same type.


-- Jack Krupansky

-Original Message- 
From: Sheldon P

Sent: Friday, August 24, 2012 3:33 PM
To: solr-user@lucene.apache.org
Subject: How do I represent a group of customer key/value pairs

I've just started to learn Solr and I have a question about modeling data
in the schema.xml.

I'm using SolrJ to interact with my Solr server.  It's easy for me to store
key/value paris where the key is known.  For example, if I have:

title=Some book title
author=The authors name


I can represent that data in the schema.xml file like this:

   field name=title type=text_general indexed=true
stored=true/
   field name=author type=text_general indexed=true
stored=true/

I also have data that is stored as a Java HashMap, where the keys are
unknown:

MapString, String map = new HashMapString, String();
map.put(some unknown key, some unknown data);
map.put(another unknown key, more unknown data);


I would prefer to store that data in Solr without losing its hierarchy.
For example:

field name=map type=maptype indexed=true stored=true/

field name=some unknown key type=text_general indexed=true
stored=true/

field name=another unknown key type=text_general indexed=true
stored=true/

/field


Then I could search for some unknown key, and receive some unknown data.

Is this possible in Solr?  What is the best way to store this kind of data? 



Re: How do I represent a group of customer key/value pairs

2012-08-24 Thread Sheldon P
Thanks for the prompt reply Jack.  Could you point me towards any code
examples of that technique?


On Fri, Aug 24, 2012 at 4:31 PM, Jack Krupansky j...@basetechnology.com wrote:
 The general rule in Solr is simple: denormalize your data.

 If you have some maps (or tables) and a set of keys (columns) for each map
 (table), define fields with names like map-name_key-name, such as
 map1_name, map2_name, map1_field1, map2_field1. Solr has dynamic
 fields, so you can define map-name_* to have a desired type - if all the
 keys have the same type.

 -- Jack Krupansky

 -Original Message- From: Sheldon P
 Sent: Friday, August 24, 2012 3:33 PM
 To: solr-user@lucene.apache.org
 Subject: How do I represent a group of customer key/value pairs


 I've just started to learn Solr and I have a question about modeling data
 in the schema.xml.

 I'm using SolrJ to interact with my Solr server.  It's easy for me to store
 key/value paris where the key is known.  For example, if I have:

 title=Some book title
 author=The authors name


 I can represent that data in the schema.xml file like this:

field name=title type=text_general indexed=true
 stored=true/
field name=author type=text_general indexed=true
 stored=true/

 I also have data that is stored as a Java HashMap, where the keys are
 unknown:

 MapString, String map = new HashMapString, String();
 map.put(some unknown key, some unknown data);
 map.put(another unknown key, more unknown data);


 I would prefer to store that data in Solr without losing its hierarchy.
 For example:

 field name=map type=maptype indexed=true stored=true/

 field name=some unknown key type=text_general indexed=true
 stored=true/

 field name=another unknown key type=text_general indexed=true
 stored=true/

 /field


 Then I could search for some unknown key, and receive some unknown data.

 Is this possible in Solr?  What is the best way to store this kind of data?


RE: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

2012-08-24 Thread Fuad Efendi
Any news? 
CC: Dev


-Original Message-
Subject: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

Hi there,

Load term Info shows 3650 for a specific term MyTerm, and when I execute
query channel:MyTerm it shows 650 documents foundŠ possibly bugŠ it
happens after I commit data too, nothing changes; and this field is
single-valued non-tokenized string.

-Fuad

--
Fuad Efendi
416-993-2060
http://www.tokenizer.ca