Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2019-10-09 Thread Alexandre Rafalovitch
Try referencing the jar directly (by absolute path) with a statement
in the solrconfig.xml (and reloading the core).
The DIH example shipped with Solr shows how it works.
This will help to see if the problem with not finding the jar or something else.

Regards,
   Alex.

On Wed, 9 Oct 2019 at 09:14, Erick Erickson  wrote:
>
> Try starting Solr with the “-v” option. That will echo all the jars that are 
> loaded and the paths.
>
> Where _exactly_ is the jar file? You say “in the lib folder of my core”, but 
> that leaves a lot of room for interpretation.
>
> Are you running stand-alone or SolrCloud? Exactly how do you start Solr?
>
> Details matter
>
> Best,
> Erick
>
> > On Oct 9, 2019, at 3:07 AM, guptavaibhav35  wrote:
> >
> > Hi,
> > Kindly help me solve the issue when I am connecting NEO4j with solr. I am
> > facing this issue in my log file while I have the jar file of neo4j driver
> > in the lib folder of my core.
> >
> > Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
> > org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
> > load driver: org.neo4j.jdbc.Driver Processing Document # 1
> >   at
> > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
> >   at
> > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
> >   at
> > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
> >   at
> > org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
> >   at java.base/java.lang.Thread.run(Thread.java:835)
> > Caused by: java.lang.RuntimeException:
> > org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
> > load driver: org.neo4j.jdbc.Driver Processing Document # 1
> >   at
> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
> >   at
> > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
> >   at
> > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
> >   ... 4 more
> > Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> > Could not load driver: org.neo4j.jdbc.Driver Processing Document # 1
> >   at
> > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
> >   at
> > org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:159)
> >   at
> > org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:80)
> >   at
> > org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:397)
> >   at
> > org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:100)
> >   at
> > org.apache.solr.handler.dataimport.SqlEntityProcessor.init(SqlEntityProcessor.java:53)
> >   at
> > org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:77)
> >   at
> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:434)
> >   at
> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
> >   ... 6 more
> > Caused by: java.lang.ClassNotFoundException: Unable to load
> > org.neo4j.jdbc.Driver or
> > org.apache.solr.handler.dataimport.org.neo4j.jdbc.Driver
> >   at
> > org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935)
> >   at
> > org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:157)
> >   ... 13 more
> > Caused by: org.apache.solr.common.SolrException: Error loading class
> > 'org.neo4j.jdbc.Driver'
> >   at
> > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:557)
> >   at
> > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:488)
> >   at
> > org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:926)
> >   ... 14 more
> > Caused by: java.lang.ClassNotFoundException: org.neo4j.jdbc.Driver
> >   at 
> > java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:436)
> >   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
> >   at
> > java.base/java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:864)
> >   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
> >   at java.base/java.lang.Class.forName0(Native Method)
> >   at java.base/java.lang.Class.forName(Class.java:415)
> >   at
> > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:541)
> >   ... 16 more
> >
> >
> >
> > --
> > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2019-10-09 Thread Erick Erickson
Try starting Solr with the “-v” option. That will echo all the jars that are 
loaded and the paths.

Where _exactly_ is the jar file? You say “in the lib folder of my core”, but 
that leaves a lot of room for interpretation.

Are you running stand-alone or SolrCloud? Exactly how do you start Solr?

Details matter

Best,
Erick

> On Oct 9, 2019, at 3:07 AM, guptavaibhav35  wrote:
> 
> Hi,
> Kindly help me solve the issue when I am connecting NEO4j with solr. I am
> facing this issue in my log file while I have the jar file of neo4j driver
> in the lib folder of my core. 
> 
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
> load driver: org.neo4j.jdbc.Driver Processing Document # 1
>   at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
>   at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>   at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>   at
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>   at java.base/java.lang.Thread.run(Thread.java:835)
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
> load driver: org.neo4j.jdbc.Driver Processing Document # 1
>   at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>   ... 4 more
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> Could not load driver: org.neo4j.jdbc.Driver Processing Document # 1
>   at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
>   at
> org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:159)
>   at
> org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:80)
>   at
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:397)
>   at
> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:100)
>   at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.init(SqlEntityProcessor.java:53)
>   at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:77)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:434)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>   ... 6 more
> Caused by: java.lang.ClassNotFoundException: Unable to load
> org.neo4j.jdbc.Driver or
> org.apache.solr.handler.dataimport.org.neo4j.jdbc.Driver
>   at
> org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935)
>   at
> org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:157)
>   ... 13 more
> Caused by: org.apache.solr.common.SolrException: Error loading class
> 'org.neo4j.jdbc.Driver'
>   at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:557)
>   at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:488)
>   at
> org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:926)
>   ... 14 more
> Caused by: java.lang.ClassNotFoundException: org.neo4j.jdbc.Driver
>   at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:436)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
>   at
> java.base/java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:864)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:415)
>   at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:541)
>   ... 16 more
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2019-10-09 Thread guptavaibhav35
Hi,
Kindly help me solve the issue when I am connecting NEO4j with solr. I am
facing this issue in my log file while I have the jar file of neo4j driver
in the lib folder of my core. 

Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
load driver: org.neo4j.jdbc.Driver Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
at java.base/java.lang.Thread.run(Thread.java:835)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
load driver: org.neo4j.jdbc.Driver Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
... 4 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Could not load driver: org.neo4j.jdbc.Driver Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at
org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:159)
at
org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:80)
at
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:397)
at
org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:100)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.init(SqlEntityProcessor.java:53)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:77)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:434)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
... 6 more
Caused by: java.lang.ClassNotFoundException: Unable to load
org.neo4j.jdbc.Driver or
org.apache.solr.handler.dataimport.org.neo4j.jdbc.Driver
at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935)
at
org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:157)
... 13 more
Caused by: org.apache.solr.common.SolrException: Error loading class
'org.neo4j.jdbc.Driver'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:557)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:488)
at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:926)
... 14 more
Caused by: java.lang.ClassNotFoundException: org.neo4j.jdbc.Driver
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:436)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
at
java.base/java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:864)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:415)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:541)
... 16 more



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Dataimport problem

2019-07-31 Thread Alexandre Rafalovitch
I wonder if you have some sort of JDBC pool enabled and/or the number
of worker threads is configured differently. Compare tomcat level
configuration and/or try thread dump of the java runtime when you are
stuck.

Or maybe something similar on the Postgres side.

Regards,
   Alex.

On Wed, 31 Jul 2019 at 10:36, Srinivas Kashyap  wrote:
>
> Hi,
> Hi,
>
> 1)Have you tried running _just_ your SQL queries to see how long they take to 
> respond and whether it responds with the full result set of batches
>
> The 9th request returns only 2 rows. This behaviour is happening for all the 
> cores which have more than 8 SQL requests. But the same is working fine with 
> AWS hosting. Really baffled.
>
> Thanks and Regards,
> Srinivas Kashyap
>
> -Original Message-
> From: Erick Erickson 
> Sent: 31 July 2019 08:00 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Dataimport problem
>
> This code is a little old, but should give you a place to start:
>
> https://lucidworks.com/post/indexing-with-solrj/
>
> As for DIH, my guess is that when you moved to Azure, your connectivity to 
> the DB changed, possibly the driver Solr uses etc., and your SQL query in 
> step 9 went from, maybe, batching rows to returning the entire result set or 
> similar weirdness. Have you tried running _just_ your SQL queries to see how 
> long they take to respond and whether it responds with the full result set of 
> batches?
>
> Best,
> Erick
>
> > On Jul 31, 2019, at 10:18 AM, Srinivas Kashyap  
> > wrote:
> >
> > Hi,
> >
> > 1) Solr on Tomcat has not been an option for quite a while. So, you must be 
> > running an old version of Solr. Which one?
> >
> > We are using Solr 5.2.1(WAR based deployment so)
> >
> >
> > 5) DIH is not actually recommended for production, more for exploration; 
> > you may want to consider moving to a stronger architecture given the 
> > complexity of your needs
> >
> > Can you please give pointers to look into, We are using DIH for production 
> > and facing few issues. We need to start phasing out
> >
> >
> > Thanks and Regards,
> > Srinivas Kashyap
> >
> > -Original Message-
> > From: Alexandre Rafalovitch 
> > Sent: 31 July 2019 07:41 PM
> > To: solr-user 
> > Subject: Re: Dataimport problem
> >
> > A couple of things:
> > 1) Solr on Tomcat has not been an option for quite a while. So, you must be 
> > running an old version of Solr. Which one?
> > 2) Compare that you have the same Solr config. In Admin UI, there will be 
> > all O/S variables passed to the Java runtime, I would check them 
> > side-by-side
> > 3) You can enable Dataimport(DIH) debug in Admin UI, so perhaps you can run 
> > a subset (1?) of the queries and see the difference
> > 4) Worst case, you may want to track this in between Solr and DB by using 
> > network analyzer (e.g. Wireshark). That may show you the actual queries, 
> > timing, connection issues, etc
> > 5) DIH is not actually recommended for production, more for exploration; 
> > you may want to consider moving to a stronger architecture given the 
> > complexity of your needs
> >
> > Regards,
> >   Alex.
> >
> > On Wed, 31 Jul 2019 at 10:04, Srinivas Kashyap  
> > wrote:
> >>
> >> Hello,
> >>
> >> We are trying to run Solr(Tomcat) on Azure instance and postgres being the 
> >> DB. When I run full import(my core has 18 SQL queries), for some reason, 
> >> the requests will go till 9 and it gets hung for eternity.
> >>
> >> But the same setup, solr(tomcat) and postgres database works fine with AWS 
> >> hosting.
> >>
> >> Am I missing some configuration? Please let me know.
> >>
> >> Thanks and Regards,
> >> Srinivas Kashyap
> >> 
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.


RE: Dataimport problem

2019-07-31 Thread Srinivas Kashyap
Hi,
Hi,

1)Have you tried running _just_ your SQL queries to see how long they take to 
respond and whether it responds with the full result set of batches

The 9th request returns only 2 rows. This behaviour is happening for all the 
cores which have more than 8 SQL requests. But the same is working fine with 
AWS hosting. Really baffled.

Thanks and Regards,
Srinivas Kashyap

-Original Message-
From: Erick Erickson 
Sent: 31 July 2019 08:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Dataimport problem

This code is a little old, but should give you a place to start:

https://lucidworks.com/post/indexing-with-solrj/

As for DIH, my guess is that when you moved to Azure, your connectivity to the 
DB changed, possibly the driver Solr uses etc., and your SQL query in step 9 
went from, maybe, batching rows to returning the entire result set or similar 
weirdness. Have you tried running _just_ your SQL queries to see how long they 
take to respond and whether it responds with the full result set of batches?

Best,
Erick

> On Jul 31, 2019, at 10:18 AM, Srinivas Kashyap  
> wrote:
>
> Hi,
>
> 1) Solr on Tomcat has not been an option for quite a while. So, you must be 
> running an old version of Solr. Which one?
>
> We are using Solr 5.2.1(WAR based deployment so)
>
>
> 5) DIH is not actually recommended for production, more for exploration; you 
> may want to consider moving to a stronger architecture given the complexity 
> of your needs
>
> Can you please give pointers to look into, We are using DIH for production 
> and facing few issues. We need to start phasing out
>
>
> Thanks and Regards,
> Srinivas Kashyap
>
> -Original Message-
> From: Alexandre Rafalovitch 
> Sent: 31 July 2019 07:41 PM
> To: solr-user 
> Subject: Re: Dataimport problem
>
> A couple of things:
> 1) Solr on Tomcat has not been an option for quite a while. So, you must be 
> running an old version of Solr. Which one?
> 2) Compare that you have the same Solr config. In Admin UI, there will be all 
> O/S variables passed to the Java runtime, I would check them side-by-side
> 3) You can enable Dataimport(DIH) debug in Admin UI, so perhaps you can run a 
> subset (1?) of the queries and see the difference
> 4) Worst case, you may want to track this in between Solr and DB by using 
> network analyzer (e.g. Wireshark). That may show you the actual queries, 
> timing, connection issues, etc
> 5) DIH is not actually recommended for production, more for exploration; you 
> may want to consider moving to a stronger architecture given the complexity 
> of your needs
>
> Regards,
>   Alex.
>
> On Wed, 31 Jul 2019 at 10:04, Srinivas Kashyap  
> wrote:
>>
>> Hello,
>>
>> We are trying to run Solr(Tomcat) on Azure instance and postgres being the 
>> DB. When I run full import(my core has 18 SQL queries), for some reason, the 
>> requests will go till 9 and it gets hung for eternity.
>>
>> But the same setup, solr(tomcat) and postgres database works fine with AWS 
>> hosting.
>>
>> Am I missing some configuration? Please let me know.
>>
>> Thanks and Regards,
>> Srinivas Kashyap
>> 

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


RE: Dataimport problem

2019-07-31 Thread Srinivas Kashyap
Hi,

1) Solr on Tomcat has not been an option for quite a while. So, you must be 
running an old version of Solr. Which one?

We are using Solr 5.2.1(WAR based deployment so)


5) DIH is not actually recommended for production, more for exploration; you 
may want to consider moving to a stronger architecture given the complexity of 
your needs

Can you please give pointers to look into, We are using DIH for production and 
facing few issues. We need to start phasing out


Thanks and Regards,
Srinivas Kashyap
            
-Original Message-
From: Alexandre Rafalovitch  
Sent: 31 July 2019 07:41 PM
To: solr-user 
Subject: Re: Dataimport problem

A couple of things:
1) Solr on Tomcat has not been an option for quite a while. So, you must be 
running an old version of Solr. Which one?
2) Compare that you have the same Solr config. In Admin UI, there will be all 
O/S variables passed to the Java runtime, I would check them side-by-side
3) You can enable Dataimport(DIH) debug in Admin UI, so perhaps you can run a 
subset (1?) of the queries and see the difference
4) Worst case, you may want to track this in between Solr and DB by using 
network analyzer (e.g. Wireshark). That may show you the actual queries, 
timing, connection issues, etc
5) DIH is not actually recommended for production, more for exploration; you 
may want to consider moving to a stronger architecture given the complexity of 
your needs

Regards,
   Alex.

On Wed, 31 Jul 2019 at 10:04, Srinivas Kashyap  wrote:
>
> Hello,
>
> We are trying to run Solr(Tomcat) on Azure instance and postgres being the 
> DB. When I run full import(my core has 18 SQL queries), for some reason, the 
> requests will go till 9 and it gets hung for eternity.
>
> But the same setup, solr(tomcat) and postgres database works fine with AWS 
> hosting.
>
> Am I missing some configuration? Please let me know.
>
> Thanks and Regards,
> Srinivas Kashyap
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.


Re: Dataimport problem

2019-07-31 Thread Alexandre Rafalovitch
A couple of things:
1) Solr on Tomcat has not been an option for quite a while. So, you
must be running an old version of Solr. Which one?
2) Compare that you have the same Solr config. In Admin UI, there will
be all O/S variables passed to the Java runtime, I would check them
side-by-side
3) You can enable Dataimport(DIH) debug in Admin UI, so perhaps you
can run a subset (1?) of the queries and see the difference
4) Worst case, you may want to track this in between Solr and DB by
using network analyzer (e.g. Wireshark). That may show you the actual
queries, timing, connection issues, etc
5) DIH is not actually recommended for production, more for
exploration; you may want to consider moving to a stronger
architecture given the complexity of your needs

Regards,
   Alex.

On Wed, 31 Jul 2019 at 10:04, Srinivas Kashyap  wrote:
>
> Hello,
>
> We are trying to run Solr(Tomcat) on Azure instance and postgres being the 
> DB. When I run full import(my core has 18 SQL queries), for some reason, the 
> requests will go till 9 and it gets hung for eternity.
>
> But the same setup, solr(tomcat) and postgres database works fine with AWS 
> hosting.
>
> Am I missing some configuration? Please let me know.
>
> Thanks and Regards,
> Srinivas Kashyap
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.


Re: dataimport for full-import

2019-03-29 Thread Alexandre Rafalovitch
It is probably autocommit setting in your solrconfig.xml.

But you may also want to consider indexing into a new core and then doing a
core swap at the end. Or re-aliasing if you are running a multiCore
collection.

Regards,
 Alex

On Fri, Mar 29, 2019, 2:25 AM 黄云尧,  wrote:

> when I do the full-import , it may take about 1 hours,but these old
> documents will be deleted after 10 minite, it cause query nothing 。what do
> something method to controller the old documents  will be deleted after
> longer?
>
>
>
>


Re: Dataimport UI - shows green even when import fails

2018-11-30 Thread Jan Høydahl
I have seen the same if the JDBC jar is not found, you cannot tell from the UI, 
you have to go to Solr logs. We should fix this!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 30. nov. 2018 kl. 00:46 skrev Shawn Heisey :
> 
> I'm looking into a problem where the admin UI dataimport screen has a green 
> status summary bar even though an import has failed.
> 
> Here's a screenshot:
> 
> https://www.dropbox.com/s/94baej11nn75746/dih-green-import-failed.png?dl=0
> 
> What I did to get this:
> Downloaded 7.5.0.
> Extracted the archive.
> On windows 10, in a command prompt at the root of the extracted files:
> bin\solr -e dih
> 
> Then I edited the DIH config for the "db" core, changing the URL to this 
> (just added a "2" near the end):
> 
> url="jdbc:hsqldb:${solr.install.dir}/example/example-DIH/hsqldb/e2x"
> 
> Once that was done, I just clicked the "Execute" button in the dataimport UI 
> for the db core.  The import failed, because the database name in the 
> modified URL doesn't exist.  But the page still shows the status summary in 
> green, with a green check mark.  The screenshot shows "Full Import failed" in 
> the raw status output.  A quick glance at this page will leave a typical user 
> with the incorrect impression that everything is fine with their import.
> 
> I thought I should just go ahead and file a bug, but before I do that, I'd 
> like to know if I should have expected something different here.
> 
> There's been a lot of issues on problems with the fact that the DIH status 
> response is extremely difficult for computers to parse.  It's probably just 
> as hard for the admin UI to parse as it is for most users.  I once wrote some 
> SolrJ code to handle parsing that response.  There was so much code that it 
> needed its own class.
> 
> https://issues.apache.org/jira/browse/SOLR-2728
> https://issues.apache.org/jira/browse/SOLR-2729
> https://issues.apache.org/jira/browse/SOLR-3319
> https://issues.apache.org/jira/browse/SOLR-3689
> https://issues.apache.org/jira/browse/SOLR-4241
> 
> Thanks,
> Shawn
> 



Re: Dataimport not working on solrcloud

2018-08-21 Thread Shawn Heisey

On 8/20/2018 10:00 PM, Sushant Vengurlekar wrote:

I have a dataimport working on standalone solr instance but the same
doesn't work on solrcloud. I keep on hitting this error

Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
Exception in invoking url


There will be more to this error than what you've shared. Look in 
solr.log, and share all the ERROR/WARN entries from the correct 
timeframe.  Some of them can be quite long.  We will need *all* of that 
information.  Will also need the exact Solr version.



The url is returning well formed xml. I have verified that. The solr nodes
can fully resolve this url. I checked that out. I have the following params
set in xml-import.xml

connectionTimeout="50" readTimeout="5000"


We'll need to see the full dataimport config and the handler config from 
solrconfig.xml.


Thanks,
Shawn



Re: Dataimport performance

2018-06-07 Thread Shawn Heisey

On 6/7/2018 12:19 AM, kotekaman wrote:

sorry. may i know how to code it?


Code *what*?

Here's the same wiki page that I gave you for your last message:

https://wiki.apache.org/solr/UsingMailingLists

Even if I go to the Nabble website and discover that you've replied to a 
topic that's SEVEN AND A HALF YEARS OLD, that information doesn't help 
me understand exactly what it is you want to know.  The previous 
information in the topic is a question and answer about what kind of 
performance can be expected from the dataimport handler.  There's 
nothing about coding in it.


Thanks,
Shawn



Re: Dataimport performance

2018-06-07 Thread kotekaman
sorry. may i know how to code it?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Dataimport handler showing idle status with multiple shards

2017-12-05 Thread Sarah Weissman


From: Shawn Heisey <elyog...@elyograg.org>
Reply-To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Date: Tuesday, December 5, 2017 at 1:31 PM
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Subject: Re: Dataimport handler showing idle status with multiple shards

On 12/5/2017 10:47 AM, Sarah Weissman wrote:
I’ve recently been using the dataimport handler to import records from a 
database into a Solr cloud collection with multiple shards. I have 6 dataimport 
handlers configured on 6 different paths all running simultaneously against the 
same DB. I’ve noticed that when I do this I often get “idle” status from the 
DIH even when the import is still running. The percentage of the time I get an 
“idle” response seems proportional to the number of shards. I.e., with 1 shard 
it always shows me non-idle status, with 2 shards I see idle about half the 
time I check the status, with 96 shards it seems to be showing idle almost all 
the time. I can see the size of each shard increasing, so I’m sure the import 
is still going.

I recently switched from 6.1 to 7.1 and I don’t remember this happening in 6.1. 
Does anyone know why the DIH would report idle when it’s running?

e.g.:
curl http://myserver:8983/solr/collection/dataimport6



To use DIH with SolrCloud, you should be sending your request directly
to a shard replica core, not the collection, so that you can be
absolutely certain that the import command and the status command are
going to the same place.  You MIGHT need to also have a distrib=false
parameter on the request, but I do not know whether that is required to
prevent the load balancing on the dataimport handler.



Thanks for the information, Shawn. I am relatively new to Solr cloud and I am 
used to running the dataimport from the admin dashboard, where it happens at 
the collection level, so I find it surprising that the right way to do this is 
at the core level. So, if I want to be able to check the status of my data 
import for N cores I would need to create N different data import configs that 
manually partition the collection and start each different config on a 
different core? That seems like it could get confusing. And then if I wanted to 
grow or shrink my shards I’d have to rejigger my data import configs every 
time. I kind of expect a distributed index to hide these details from me.

I only have one node at the moment, and I don’t understand how Solr cloud works 
internally well enough to understand what it means for the data import to be 
running on a shard vs. a node. It would be nice if doing a status query would 
at least tell you something, like the number of documents last indexed on that 
core, even if nothing is currently running. That way at least I could 
extrapolate how much longer the operation will take.



Re: Dataimport handler showing idle status with multiple shards

2017-12-05 Thread Shawn Heisey

On 12/5/2017 10:47 AM, Sarah Weissman wrote:

I’ve recently been using the dataimport handler to import records from a 
database into a Solr cloud collection with multiple shards. I have 6 dataimport 
handlers configured on 6 different paths all running simultaneously against the 
same DB. I’ve noticed that when I do this I often get “idle” status from the 
DIH even when the import is still running. The percentage of the time I get an 
“idle” response seems proportional to the number of shards. I.e., with 1 shard 
it always shows me non-idle status, with 2 shards I see idle about half the 
time I check the status, with 96 shards it seems to be showing idle almost all 
the time. I can see the size of each shard increasing, so I’m sure the import 
is still going.

I recently switched from 6.1 to 7.1 and I don’t remember this happening in 6.1. 
Does anyone know why the DIH would report idle when it’s running?

e.g.:
curl http://myserver:8983/solr/collection/dataimport6


When you send a DIH request to the collection name, SolrCloud is going 
to load balance that request across the cloud, just like it would with 
any other request.  Solr will look at the list of all responding nodes 
that host part of the collection and send multiple such requests to 
different cores (shards/replicas) across the cloud.  If there are four 
cores in the collection and the nodes hosting them are all working, then 
each of those cores would only see requests to /dataimport about one 
fourth of the time.


DIH imports happen at the core level, NOT the collection level, so when 
you start an import on a collection with four cores in the cloud, only 
one of those four cores is actually going to be doing the import, the 
rest of them are idle.


This behavior should happen with any version, so I would expect it in 
6.1 as well as 7.1.


To use DIH with SolrCloud, you should be sending your request directly 
to a shard replica core, not the collection, so that you can be 
absolutely certain that the import command and the status command are 
going to the same place.  You MIGHT need to also have a distrib=false 
parameter on the request, but I do not know whether that is required to 
prevent the load balancing on the dataimport handler.


A similar question came to this list two days ago, and I replied to that 
one yesterday.


http://lucene.472066.n3.nabble.com/Dataimporter-status-tp4365602p4365879.html

Somebody did open an issue a LONG time ago about this problem:

https://issues.apache.org/jira/browse/SOLR-3666

I just commented on the issue.

Thanks,
Shawn



RE: DataImport Handler Out of Memory

2017-09-27 Thread Allison, Timothy B.
https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F


-Original Message-
From: Deeksha Sharma [mailto:dsha...@flexera.com] 
Sent: Wednesday, September 27, 2017 1:40 PM
To: solr-user@lucene.apache.org
Subject: DataImport Handler Out of Memory

I am trying to create indexes using dataimport handler (Solr 5.2.1). Data is in 
mysql db and the number of records are more than 3.5 million. My solr server 
stops due to OOM (out of memory error). I tried starting solr by giving 12GB of 
RAM but still no luck.


Also, I see that Solr fetches all the documents in 1 request. Is there a way to 
configure Solr to stream the data from DB or any other solution somewhere may 
have tried?


Note: When my records are nearly 2 Million, I am able to create indexes by 
giving Solr 10GB of RAM.


Your help is appreciated.



Thanks

Deeksha




Re: dataimport to a smaller Solr farm

2017-03-22 Thread Mikhail Khludnev
Hello, Dean.

DIH is shard agnostic. How do you try to specify "a shard from the new
collection"?

On Tue, Mar 21, 2017 at 8:24 PM, deansg  wrote:

> Hello,
> My team often uses the /dataimport & /dih handlers to move items from one
> Solr collection to another. However, all the times we did that, the number
> of shards in the new collection was always the same or higher than in the
> old.
> Can /dataimport work if I have less shards in the new collection than in
> the
> old one? I tried specifying a shard from the new collection multiple times
> in the data-config file, and it didn't seem to work - there were no visible
> exceptions, but most items simply didn't enter the new collection.
> Dean.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/dataimport-to-a-smaller-Solr-farm-tp4326067.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev


Re: DataImport-Solr6: Nested Entities

2016-08-19 Thread Shawn Heisey
On 8/18/2016 5:10 PM, Peri Subrahmanya wrote:
> Hi,
>
> I have a simple one-to-many relationship setup in the data-import.xml and 
> when I try to index it using the dataImportHandler, Solr complains of “no 
> unique id found”. 
>
> managed-schema.xml
> id
> solrconfig,xml:
> 
>   
> id
>   

>  query=“select blah blah from course where 
> catalog_id=‘${catalog.catalog_id}'">
> 

Can you get the full error message(s) from the solr.log file, including
the full java stacktrace(s)?  Many error messages are dozens of lines
long, because they include Java stacktraces.  For correct
interpretation, we also need the exact version of Solr that you're
running.  Your subject indicates Solr6, but there are three releases so
far in the 6.x series.

If you want your update processor chain to be used by DIH, I think you
need to make it the default chain with 'default="true"' in the opening
tag.  There might be a way to apply a specific update chain in DIH, but
if there is, you need to give it a name, which yours doesn't have.

I am using a custom update chain with both DIH and explicit update
requests, which I do like this:



Thanks,
Shawn



Re: DataImport-Solr6: Nested Entities

2016-08-18 Thread Alexandre Rafalovitch
Well, do both parent and child entity have a field called 'id'
containing their corresponding unique ids? That would be the first
step.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 19 August 2016 at 09:10, Peri Subrahmanya
 wrote:
> Hi,
>
> I have a simple one-to-many relationship setup in the data-import.xml and 
> when I try to index it using the dataImportHandler, Solr complains of “no 
> unique id found”.
>
> managed-schema.xml
> id
> solrconfig,xml:
> 
>   
> id
>   
>   
>   
> 
>
> data-import.xml
> 
>  driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://x.x.x.x:3306/xx"
> user=“blah"
> password=“blah"/>
>
> 
>  query="select blah blah from catalog">
>
>  query=“select blah blah from course where 
> catalog_id=‘${catalog.catalog_id}'">
> 
>
> 
> 
>
> 
>
> Could someone please advise?
>
> Thanks
> -Peri


RE: dataimport db-data-config.xml

2016-04-29 Thread Davis, Daniel (NIH/NLM) [C]
Kishor,

Data Import Handler doesn't know how to randomly access rows from the CSV to 
"JOIN" them to rows from the MySQL table at indexing time.
However, both MySQL and Solr know how to JOIN rows/documents from multiple 
tables/collections/cores.

Data Import Handler could read the CSV first, and query MySQL within that, but 
I don't think that's a great architecture because it depends on the business 
requirements in a rather brittle way (more on this below).

So, I see three basic architectures:

Use MySQL to do the JOIN:
--
- Your indexing isn't just DIH, but a script that first.
- Imports the CSV into a MySQL table, validating that the id in the CSV table 
is found in the MySQL table.
- Your DIH has either an  for one SQL query that contains an  
for the other SQL query, or it has a JOIN query/query on a MySQL view.

This is ideal if:
- Your resources (including you) are more familiar with RDBMS technology than 
Solr.
- You have no business requirement to return rows from just the MySQL table or 
just the CSV as search results.
- The data is small enough that the processing time to import into MySQL each 
time you index is acceptable.

Use Solr to do the JOIN:
--
- Index all the rows from the CSV as documents within Solr, 
- Index all the rows from the MySQL table as documents within Solr,
- Use JOIN queries to query them together.

This is ideal if:
- You don't control the MySQL database, and have no way at all to add a table 
to it.
- You have a business requirement to return either or both results from the 
MySQL table or the CSV.
- You want Solr JOIN queries on your Solr resume ;)   Not a terribly good 
reason, I guess.


Use Data Import Handler to do the JOIN:
---
If you absolutely want to join the data using Data Import Handler, then:
- Have DIH loop through the CSV *first*, and then make queries based on the id 
into the MySQL table.
- In this case, the  for the MySQL query will appear within the 
 for the CSV row, which will appear within an  for the CSV file 
within the filesystem.
- The  for the CSV row would be the primary document entity.

This is only appropriate if:
- There is no business requirement to search for results directly from the 
MySQL table on its own.
- Your business requirements suggest one result for each row from the CSV, 
rather than from the MySQL table or either way.
- The CSV contains every id in the MySQL table, or the entries within the MySQL 
table that don't have anything from the CSV shouldn't appear in the results 
anyway.


-Original Message-
From: kishor [mailto:krajus...@gmail.com] 
Sent: Friday, April 29, 2016 4:58 AM
To: solr-user@lucene.apache.org
Subject: dataimport db-data-config.xml

I want to import data from mysql-table and csv file ata the same time beacuse 
some data are in mysql tables and some are in csv file . I want to match 
specific id from mysql table in csv file then add the data in solar.

What i think or wnat to do








   




   

Is this possible in solr? 

Please suggest me How to import data from csv and mysql table at the same time.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-db-data-config-xml-tp4270673p4273614.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimport db-data-config.xml

2016-04-17 Thread Reth RM
 What are the errors reported?  Errors can be either seen on admin page
logging tab or log file under solr_home.
If you follow the steps mentioned on the blog precisely, it should almost
work
http://solr.pl/en/2010/10/11/data-import-handler-%E2%80%93-how-to-import-data-from-sql-databases-part-1/

If you encounter errors at any step, lets us know.




On Sat, Apr 16, 2016 at 10:49 AM, kishor  wrote:

> I am try to run two pgsql query on same data-source. is this possible in
> db-data-config.xml.
>
>
> 
>
>  url="jdbc:postgresql://0.0.0.0:5432/iboats"
> user="iboats"
> password="root" />
>
> 
>  transformer="TemplateTransformer">
>
>  template="user1-${user1.id}"/>
>
>
> This code is not working please suggest any more example
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/dataimport-db-data-config-xml-tp4270673.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Dataimport failing on SOLR 5.2

2015-06-08 Thread William Bell
https://issues.apache.org/jira/browse/SOLR-7588

On Mon, Jun 8, 2015 at 2:11 AM, William Bell billnb...@gmail.com wrote:

 Uncaught ReferenceError: naturalSort is not defined
 $.ajax.success @ dataimport.js?_=5.2.0:48
 jQuery.Callbacks.fire @ require.js?_=5.2.0:3126
 jQuery.Callbacks.self.fireWith @ require.js?_=5.2.0:3244
 done @ require.js?_=5.2.0:9482
 jQuery.ajaxTransport.send.callback @ require.js?_=5.2.0:10263

 On Mon, Jun 8, 2015 at 2:10 AM, William Bell billnb...@gmail.com wrote:

 Also getting:

 Uncaught ReferenceError: naturalSort is not defined


 On Mon, Jun 8, 2015 at 1:50 AM, William Bell billnb...@gmail.com wrote:


1. When I click DataImport in the UI on any core.. The UI just
spins:
2.
3.

 http://hgsolr2devmstr:8983/solr/autosuggest/admin/mbeans?cat=QUERYHANDLERwt=json_=1433749812067

 Here is the response: 200:

 {responseHeader:{status:0,QTime:1},solr-mbeans:[QUERYHANDLER,{allcondalpha:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autospecportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autosuggestallpracproc:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomecondnopt:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},allprocalpha:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autopayorportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomespec:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/update/json:{class:org.apache.solr.handler.UpdateRequestHandler,version:5.2.0,description:Add
 documents using XML (with XSLT), CSV, JSON, or
 javabin,src:null},joinhomepracspecnopt:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autoall2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomespeclocation:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autosuggestallpraccond:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/admin:{class:org.apache.solr.handler.admin.AdminHandlers,version:5.2.0,description:Register
 Standard Admin
 Handlers,src:null},autocondportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autoeduportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/debug/dump:{class:org.apache.solr.handler.DumpRequestHandler,version:5.2.0,description:Dump
 handler
 (debug),src:null},/admin/mbeans:{class:org.apache.solr.handler.admin.SolrInfoMBeanHandler,version:5.2.0,description:Get
 Info (and statistics) for registered
 SolrInfoMBeans,src:null},exactmatch2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/update/json/docs:{class:org.apache.solr.handler.UpdateRequestHandler,version:5.2.0,description:Add
 documents using XML (with XSLT), CSV, JSON, or
 javabin,src:null},autopracall:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinautopraccond2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinautopracproc2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/admin/properties:{class:org.apache.solr.handler.admin.PropertiesRequestHandler,version:5.2.0,description:Get
 System
 Properties,src:null},autocohort:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/admin/threads:{class:org.apache.solr.handler.admin.ThreadDumpHandler,version:5.2.0,description:Thread
 Dump,src:null},/query:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomeprocnopt:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/update/csv:{class:org.apache.solr.handler.UpdateRequestHandler,version:5.2.0,description:Add
 documents using XML (with XSLT), CSV, JSON, or
 javabin,src:null},joinautopracspecportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 

Re: Dataimport failing on SOLR 5.2

2015-06-08 Thread William Bell
Also getting:

Uncaught ReferenceError: naturalSort is not defined


On Mon, Jun 8, 2015 at 1:50 AM, William Bell billnb...@gmail.com wrote:


1. When I click DataImport in the UI on any core.. The UI just spins:
2.
3.

 http://hgsolr2devmstr:8983/solr/autosuggest/admin/mbeans?cat=QUERYHANDLERwt=json_=1433749812067

 Here is the response: 200:

 {responseHeader:{status:0,QTime:1},solr-mbeans:[QUERYHANDLER,{allcondalpha:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autospecportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autosuggestallpracproc:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomecondnopt:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},allprocalpha:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autopayorportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomespec:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/update/json:{class:org.apache.solr.handler.UpdateRequestHandler,version:5.2.0,description:Add
 documents using XML (with XSLT), CSV, JSON, or
 javabin,src:null},joinhomepracspecnopt:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autoall2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomespeclocation:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autosuggestallpraccond:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/admin:{class:org.apache.solr.handler.admin.AdminHandlers,version:5.2.0,description:Register
 Standard Admin
 Handlers,src:null},autocondportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autoeduportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/debug/dump:{class:org.apache.solr.handler.DumpRequestHandler,version:5.2.0,description:Dump
 handler
 (debug),src:null},/admin/mbeans:{class:org.apache.solr.handler.admin.SolrInfoMBeanHandler,version:5.2.0,description:Get
 Info (and statistics) for registered
 SolrInfoMBeans,src:null},exactmatch2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/update/json/docs:{class:org.apache.solr.handler.UpdateRequestHandler,version:5.2.0,description:Add
 documents using XML (with XSLT), CSV, JSON, or
 javabin,src:null},autopracall:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinautopraccond2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinautopracproc2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/admin/properties:{class:org.apache.solr.handler.admin.PropertiesRequestHandler,version:5.2.0,description:Get
 System
 Properties,src:null},autocohort:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/admin/threads:{class:org.apache.solr.handler.admin.ThreadDumpHandler,version:5.2.0,description:Thread
 Dump,src:null},/query:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomeprocnopt:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/update/csv:{class:org.apache.solr.handler.UpdateRequestHandler,version:5.2.0,description:Add
 documents using XML (with XSLT), CSV, JSON, or
 javabin,src:null},joinautopracspecportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autopractall2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},allspec:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/replication:{class:org.apache.solr.handler.ReplicationHandler,version:5.2.0,description:ReplicationHandler
 provides replication of index and configuration files from Master to
 Slaves,src:null},autoprocportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:

Re: Dataimport failing on SOLR 5.2

2015-06-08 Thread William Bell
Uncaught ReferenceError: naturalSort is not defined
$.ajax.success @ dataimport.js?_=5.2.0:48
jQuery.Callbacks.fire @ require.js?_=5.2.0:3126
jQuery.Callbacks.self.fireWith @ require.js?_=5.2.0:3244
done @ require.js?_=5.2.0:9482
jQuery.ajaxTransport.send.callback @ require.js?_=5.2.0:10263

On Mon, Jun 8, 2015 at 2:10 AM, William Bell billnb...@gmail.com wrote:

 Also getting:

 Uncaught ReferenceError: naturalSort is not defined


 On Mon, Jun 8, 2015 at 1:50 AM, William Bell billnb...@gmail.com wrote:


1. When I click DataImport in the UI on any core.. The UI just
spins:
2.
3.

 http://hgsolr2devmstr:8983/solr/autosuggest/admin/mbeans?cat=QUERYHANDLERwt=json_=1433749812067

 Here is the response: 200:

 {responseHeader:{status:0,QTime:1},solr-mbeans:[QUERYHANDLER,{allcondalpha:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autospecportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autosuggestallpracproc:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomecondnopt:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},allprocalpha:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autopayorportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomespec:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/update/json:{class:org.apache.solr.handler.UpdateRequestHandler,version:5.2.0,description:Add
 documents using XML (with XSLT), CSV, JSON, or
 javabin,src:null},joinhomepracspecnopt:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autoall2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomespeclocation:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autosuggestallpraccond:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/admin:{class:org.apache.solr.handler.admin.AdminHandlers,version:5.2.0,description:Register
 Standard Admin
 Handlers,src:null},autocondportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autoeduportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/debug/dump:{class:org.apache.solr.handler.DumpRequestHandler,version:5.2.0,description:Dump
 handler
 (debug),src:null},/admin/mbeans:{class:org.apache.solr.handler.admin.SolrInfoMBeanHandler,version:5.2.0,description:Get
 Info (and statistics) for registered
 SolrInfoMBeans,src:null},exactmatch2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/update/json/docs:{class:org.apache.solr.handler.UpdateRequestHandler,version:5.2.0,description:Add
 documents using XML (with XSLT), CSV, JSON, or
 javabin,src:null},autopracall:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinautopraccond2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinautopracproc2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/admin/properties:{class:org.apache.solr.handler.admin.PropertiesRequestHandler,version:5.2.0,description:Get
 System
 Properties,src:null},autocohort:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/admin/threads:{class:org.apache.solr.handler.admin.ThreadDumpHandler,version:5.2.0,description:Thread
 Dump,src:null},/query:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},joinhomeprocnopt:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},/update/csv:{class:org.apache.solr.handler.UpdateRequestHandler,version:5.2.0,description:Add
 documents using XML (with XSLT), CSV, JSON, or
 javabin,src:null},joinautopracspecportal:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 ,src:null},autopractall2:{class:org.apache.solr.handler.component.SearchHandler,version:5.2.0,description:Search
 using components:
 

Re: DataImport using SqlEntityProcessor running Out of Memory

2014-05-13 Thread Mikhail Khludnev
Hello O,
It seems to me (but it's better to look at the heap histogram) that
buffering sub-entities in SortedMapBackedCache blows heap off.
I'm aware about two directions:
- use file based cache instead. I don't know exactly how it works, you can
start from https://issues.apache.org/jira/browse/SOLR-2382 and check how to
enable berkleyDB cache;
- personally, I'm promoting merging resultsets ordered by RDBMS
https://issues.apache.org/jira/browse/SOLR-4799




On Fri, May 9, 2014 at 7:16 PM, O. Olson olson_...@yahoo.it wrote:

 I have a Data Schema which is Hierarchical i.e. I have an Entity and a
 number
 of attributes. For a small subset of the Data - about 300 MB, I can do the
 import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot
 do the import with 9 GB of memory.
 I am using the SqlEntityProcessor as below:

 dataConfig
 dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver

 url=jdbc:sqlserver://localhost\MSSQLSERVER;databaseName=SolrDB;user=solrusr;password=solrusr;/
 document
 entity name=Entity query=SELECT EntID, Image
 FROM ENTITY_TABLE
 field column=EntID name=EntID /
 field column=Image name=Image /

 entity name=EntityAttribute1
 query=SELECT AttributeValue, EntID FROM ATTR_TABLE
 WHERE AttributeID=1
 cacheKey=EntID
 cacheLookup=Entity.EntID
 processor=SqlEntityProcessor cacheImpl=SortedMapBackedCache
 field column=AttributeValue
 name=EntityAttribute1 /
 /entity
 entity name=EntityAttribute2
 query=SELECT AttributeValue, EntID FROM ATTR_TABLE
 WHERE AttributeID=2
 cacheKey=EntID
 cacheLookup=Entity.EntID
 processor=SqlEntityProcessor cacheImpl=SortedMapBackedCache
 field column=AttributeValue
 name=EntityAttribute2 /
 /entity



 /entity
 /document
 /dataConfig



 What is the best way to import this data? Doing it without a cache, results
 in many SQL queries. With the cache, I run out of memory.

 I’m curious why 4GB of data cannot entirely fit in memory. One thing I need
 to mention is that I have about 400 to 500 attributes.

 Thanks in advance for any helpful advice.
 O. O.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/DataImport-using-SqlEntityProcessor-running-Out-of-Memory-tp4135080.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: DataImport using SqlEntityProcessor running Out of Memory

2014-05-12 Thread Shawn Heisey
On 5/9/2014 9:16 AM, O. Olson wrote:
 I have a Data Schema which is Hierarchical i.e. I have an Entity and a number
 of attributes. For a small subset of the Data - about 300 MB, I can do the
 import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot
 do the import with 9 GB of memory. 
 I am using the SqlEntityProcessor as below: 
 
 dataConfig
 dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 url=jdbc:sqlserver://localhost\MSSQLSERVER;databaseName=SolrDB;user=solrusr;password=solrusr;/

Upgrade your JDBC driver to 1.2 or later, or turn on response buffering.
 The following URL has this information.  It's a very long URL, so if
your mail client wraps it, you may not be able to click on it properly:

http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_MS_SQL_Server_database_with_sqljdbc_driver._DataImportHandler_is_going_out_of_memory._I_tried_adjustng_the_batchSize_values_but_they_don.27t_seem_to_make_any_difference._How_do_I_fix_this.3F

Thanks,
Shawn



Re: Dataimport handler Date

2014-03-06 Thread Gora Mohanty
On 7 March 2014 08:50, Pritesh Patel priteshpate...@gmail.com wrote:
 I'm using the dataimporthandler to index data from a mysql DB.  Been
 running it just fine. I've been using full-imports. I'm now trying
 implement the delta import functionality.

 To implement the delta query, you need to be reading the last_index_time
 from a properties file to know what new to index.  So I'm using the
 parameter:
 {dataimporter.last_index_time} within my query.

 The problem is when I use this, the date always is : Thu Jan 01 00:00:00
 UTC 1970.  It's never actually reading the correct date stored in the
 dataimport.properties file.
[...]

I take it that you have verified that the dataimport.properties file exists.
What are its contents?

Please share the exact DIH configuration file that you use, obfuscating
DB password/username. Your cut-and-paste seems to have a syntax
error in the deltaQuery (notice the 'jgkg' string):
deltaQuery=SELECT node.nid from node where node.type = 'news' and
node.status = 1 and (node.changed gt;
UNIX_TIMESTAMP('${
dataimporter.last_index_time}'jgkg) or node.created gt;
UNIX_TIMESTAMP('${dataimporter.last_index_time}'))

What response do you get fromm the delta-import URL?
Are there any error messages in your Solr log?

Regards,
Gora


Re: dataimport handler

2014-01-22 Thread Shalin Shekhar Mangar
I'm guessing that id in your schema.xml is also a unique key field.
If so, each document must have an id field or Solr will refuse to
index them.

DataImportHandler will map the id field in your table to Solr schema's
id field only if you have not specified a mapping.

On Thu, Jan 23, 2014 at 3:01 AM, tom praveen...@yahoo.com wrote:
 Hi,
 I am trying to use dataimporthandler(Solr 4.6) from oracle database, but I
 have some issues in mapping the data.
 I have 3 columns in the test_table,
  column1,
  column2,
  id

 dataconfig.xml

   entity name=test_table
 query=select * from test_table 
 field column=column1 name=id /
 field column=column2 name=name /
 /entity

 Issue is,
 - if I remove the id column from the table, index fails, solr is looking for
 id column even though it is not mapped in dataconfig.xml.
 - if I add, it directly maps the id column form the db to solr id, it
 ignores the column1, even though it is mapped.

 my problem is I don't have ID in every table, I should be able to map the
 column I choose from the table to solr Id,  any solution will be greatly
 appreciated.

 `Tom




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/dataimport-handler-tp4112830.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Dataimport handler exception when migrating from 4.4 to 4.6. Help needed

2013-12-22 Thread Shawn Heisey
On 12/22/2013 9:51 AM, William Pierce wrote:
 My configurations works nicely with solr 4.4. I am encountering a 
 configuration error when I try to upgrade from 4.4 to 4.6.  All I did was the 
 following:
 
 a) Replace the 4.4 solr.war file with the 4.6 solr.war in the tomcat/lib 
 folder. I am using version 6.0.36 of tomcat.
 b) I replaced the solr-dataimporthandler-4.4.0.jar and 
 solr-dataimporthandler-extras-4.4.0.jar with the corresponding 4.6 
 counterparts in the collection/lib folder.
 
 I restarted tomcat.   I get the following stack trace (full log is also given 
 below) – there are no other warnings/errors in my log.  I have gone through 
 the 4.5 changes to see if I needed to add/modify my DIH configuration – but I 
 am stymied.  Any help will be greatly appreciated.
 
 ERROR - 2013-12-22 08:05:09.824; 
 org.apache.solr.handler.dataimport.DataImportHandler; Exception while loading 
 DataImporter
 java.lang.NoSuchMethodError: 
 org.apache.solr.core.SolrCore.getLatestSchema()Lorg/apache/solr/schema/IndexSchema;

The method it's complaining about not being there is
org.apache.solr.core.SolrCore.getLatestSchema() ... which is in Solr
itself, not the dataimport handler.  I did some checking.  This method
did not exist before 4.4.0, so my best guess is that your classloader is
loading a SolrCore class from 4.3.1 or earlier, which probably means one
of two things: 1) The Solr war you're extracting is not actually version
4.6.0, or 2) you've got jars in your system from one or more older versions.

It's a good idea to delete the extracted war data whenever you upgrade
Solr -- stop the container, delete the extracted data and all old jars,
then replace the .war file and start it back up.

Thanks,
Shawn



Re: Dataimport handler exception when migrating from 4.4 to 4.6. Help needed

2013-12-22 Thread William Bell
The best practice for upgrading is take the distribution and expand it.
Then take your cores and replace it.

Then you are guaranteed to get the jars and not have other WARs/JARs
hanging around.



On Sun, Dec 22, 2013 at 7:24 PM, Shawn Heisey s...@elyograg.org wrote:

 On 12/22/2013 9:51 AM, William Pierce wrote:
  My configurations works nicely with solr 4.4. I am encountering a
 configuration error when I try to upgrade from 4.4 to 4.6.  All I did was
 the following:
 
  a) Replace the 4.4 solr.war file with the 4.6 solr.war in the tomcat/lib
 folder. I am using version 6.0.36 of tomcat.
  b) I replaced the solr-dataimporthandler-4.4.0.jar and
 solr-dataimporthandler-extras-4.4.0.jar with the corresponding 4.6
 counterparts in the collection/lib folder.
 
  I restarted tomcat.   I get the following stack trace (full log is also
 given below) – there are no other warnings/errors in my log.  I have gone
 through the 4.5 changes to see if I needed to add/modify my DIH
 configuration – but I am stymied.  Any help will be greatly appreciated.
 
  ERROR - 2013-12-22 08:05:09.824;
 org.apache.solr.handler.dataimport.DataImportHandler; Exception while
 loading DataImporter
  java.lang.NoSuchMethodError:
 org.apache.solr.core.SolrCore.getLatestSchema()Lorg/apache/solr/schema/IndexSchema;

 The method it's complaining about not being there is
 org.apache.solr.core.SolrCore.getLatestSchema() ... which is in Solr
 itself, not the dataimport handler.  I did some checking.  This method
 did not exist before 4.4.0, so my best guess is that your classloader is
 loading a SolrCore class from 4.3.1 or earlier, which probably means one
 of two things: 1) The Solr war you're extracting is not actually version
 4.6.0, or 2) you've got jars in your system from one or more older
 versions.

 It's a good idea to delete the extracted war data whenever you upgrade
 Solr -- stop the container, delete the extracted data and all old jars,
 then replace the .war file and start it back up.

 Thanks,
 Shawn




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: DataImport Handler, writing a new EntityProcessor

2013-12-19 Thread Mathias Lux
Hi!

Thanks for all the advice! I finally did it, the most annoying error
that took me the best of a day to figure out was that the state
variable here had to be reset:
https://bitbucket.org/dermotte/liresolr/src/d27878a71c63842cb72b84162b599d99c4408965/src/main/java/net/semanticmetadata/lire/solr/LireEntityProcessor.java?at=master#cl-56

The EntityProcessor is part of this image search plugin if anyone is
interested: https://bitbucket.org/dermotte/liresolr/

:) It's always the small things that are hard to find

cheers and thanks, Mathias

On Wed, Dec 18, 2013 at 7:26 PM, P Williams
williams.tricia.l...@gmail.com wrote:
 Hi Mathias,

 I'd recommend testing one thing at a time.  See if you can get it to work
 for one image before you try a directory of images.  Also try testing using
 the solr-testframework using your ide (I use Eclipse) to debug rather than
 your browser/print statements.  Hopefully that will give you some more
 specific knowledge of what's happening around your plugin.

 I also wrote an EntityProcessor plugin to read from a properties
 filehttps://issues.apache.org/jira/browse/SOLR-3928.
  Hopefully that'll give you some insight about this kind of Solr plugin and
 testing them.

 Cheers,
 Tricia




 On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote:

 Hi all!

 I've got a question regarding writing a new EntityProcessor, in the
 same sense as the Tika one. My EntityProcessor should analyze jpg
 images and create document fields to be used with the LIRE Solr plugin
 (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
 same approach as the TikaEntityProcessor, but my setup just indexes
 the first of 1000 images. I'm using a FileListEntityProcessor to get
 all JPEGs from a directory and then I'm handing them over (see [2]).
 My code for the EntityProcessor is at [1]. I've tried to use the
 DataSource as well as the filePath attribute, but it ends up all the
 same. However, the FileListEntityProcessor is able to read all the
 files according to the debug output, but I'm missing the link from the
 FileListEntityProcessor to the LireEntityProcessor.

 I'd appreciate any pointer or help :)

 cheers,
   Mathias

 [1] LireEntityProcessor http://pastebin.com/JFajkNtf
 [2] dataConfig http://pastebin.com/vSHucatJ

 --
 Dr. Mathias Lux
 Klagenfurt University, Austria
 http://tinyurl.com/mlux-itec




-- 
PD Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec


RE: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread Dyer, James
The first thing I would suggest is to try and run it not in debug mode.  DIH's 
debug mode limits the number of documents it will take in, so that might be all 
that is wrong here.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias 
Lux
Sent: Wednesday, December 18, 2013 4:04 AM
To: solr-user@lucene.apache.org
Subject: DataImport Handler, writing a new EntityProcessor

Hi all!

I've got a question regarding writing a new EntityProcessor, in the
same sense as the Tika one. My EntityProcessor should analyze jpg
images and create document fields to be used with the LIRE Solr plugin
(https://bitbucket.org/dermotte/liresolr). Basically I've taken the
same approach as the TikaEntityProcessor, but my setup just indexes
the first of 1000 images. I'm using a FileListEntityProcessor to get
all JPEGs from a directory and then I'm handing them over (see [2]).
My code for the EntityProcessor is at [1]. I've tried to use the
DataSource as well as the filePath attribute, but it ends up all the
same. However, the FileListEntityProcessor is able to read all the
files according to the debug output, but I'm missing the link from the
FileListEntityProcessor to the LireEntityProcessor.

I'd appreciate any pointer or help :)

cheers,
  Mathias

[1] LireEntityProcessor http://pastebin.com/JFajkNtf
[2] dataConfig http://pastebin.com/vSHucatJ

-- 
Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec



Re: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread Mathias Lux
Unfortunately it is the same in non-debug, just the first document. I
also output the params to sout, but it seems only the first one is
ever arriving at my custom class. I've the feeling that I'm doing
something seriously wrong here, based on a complete misunderstanding
:) I basically assume that the nested entity processor will be called
for each of the rows that come out from its parent. I've read
somewhere, that the data has to be taken from the data source, and
I've implemented that, but it doesn't seem to change anything.

cheers,
Mathias

On Wed, Dec 18, 2013 at 3:05 PM, Dyer, James
james.d...@ingramcontent.com wrote:
 The first thing I would suggest is to try and run it not in debug mode.  
 DIH's debug mode limits the number of documents it will take in, so that 
 might be all that is wrong here.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of 
 Mathias Lux
 Sent: Wednesday, December 18, 2013 4:04 AM
 To: solr-user@lucene.apache.org
 Subject: DataImport Handler, writing a new EntityProcessor

 Hi all!

 I've got a question regarding writing a new EntityProcessor, in the
 same sense as the Tika one. My EntityProcessor should analyze jpg
 images and create document fields to be used with the LIRE Solr plugin
 (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
 same approach as the TikaEntityProcessor, but my setup just indexes
 the first of 1000 images. I'm using a FileListEntityProcessor to get
 all JPEGs from a directory and then I'm handing them over (see [2]).
 My code for the EntityProcessor is at [1]. I've tried to use the
 DataSource as well as the filePath attribute, but it ends up all the
 same. However, the FileListEntityProcessor is able to read all the
 files according to the debug output, but I'm missing the link from the
 FileListEntityProcessor to the LireEntityProcessor.

 I'd appreciate any pointer or help :)

 cheers,
   Mathias

 [1] LireEntityProcessor http://pastebin.com/JFajkNtf
 [2] dataConfig http://pastebin.com/vSHucatJ

 --
 Dr. Mathias Lux
 Klagenfurt University, Austria
 http://tinyurl.com/mlux-itec




-- 
PD Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec


Re: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread P Williams
Hi Mathias,

I'd recommend testing one thing at a time.  See if you can get it to work
for one image before you try a directory of images.  Also try testing using
the solr-testframework using your ide (I use Eclipse) to debug rather than
your browser/print statements.  Hopefully that will give you some more
specific knowledge of what's happening around your plugin.

I also wrote an EntityProcessor plugin to read from a properties
filehttps://issues.apache.org/jira/browse/SOLR-3928.
 Hopefully that'll give you some insight about this kind of Solr plugin and
testing them.

Cheers,
Tricia




On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote:

 Hi all!

 I've got a question regarding writing a new EntityProcessor, in the
 same sense as the Tika one. My EntityProcessor should analyze jpg
 images and create document fields to be used with the LIRE Solr plugin
 (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
 same approach as the TikaEntityProcessor, but my setup just indexes
 the first of 1000 images. I'm using a FileListEntityProcessor to get
 all JPEGs from a directory and then I'm handing them over (see [2]).
 My code for the EntityProcessor is at [1]. I've tried to use the
 DataSource as well as the filePath attribute, but it ends up all the
 same. However, the FileListEntityProcessor is able to read all the
 files according to the debug output, but I'm missing the link from the
 FileListEntityProcessor to the LireEntityProcessor.

 I'd appreciate any pointer or help :)

 cheers,
   Mathias

 [1] LireEntityProcessor http://pastebin.com/JFajkNtf
 [2] dataConfig http://pastebin.com/vSHucatJ

 --
 Dr. Mathias Lux
 Klagenfurt University, Austria
 http://tinyurl.com/mlux-itec



Re: dataimport handler

2013-05-10 Thread Shalin Shekhar Mangar
Hmm, I will fix.

https://issues.apache.org/jira/browse/SOLR-4788


On Thu, May 9, 2013 at 8:35 PM, William Bell billnb...@gmail.com wrote:

 It does not work anymore in 4.x.

 ${dih.last_index_time} does work, but the entity version does not.

 Bill



 On Tue, May 7, 2013 at 4:19 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

  Using ${dih.entity_name.last_index_time} should work. Make sure you put
  it in quotes in your query.
 
 
  On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com
 wrote:
 
   In the  data import handler  I have multiple entities.  Each one
   generates a date in the
   dataimport.properties i.e. entityname.last_index_time.
  
   How do I reference the specific entity time in my delta queries?
  
   Thanks
  
   Eric
  
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 



 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076




-- 
Regards,
Shalin Shekhar Mangar.


Re: dataimport handler

2013-05-09 Thread William Bell
It does not work anymore in 4.x.

${dih.last_index_time} does work, but the entity version does not.

Bill



On Tue, May 7, 2013 at 4:19 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 Using ${dih.entity_name.last_index_time} should work. Make sure you put
 it in quotes in your query.


 On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com wrote:

  In the  data import handler  I have multiple entities.  Each one
  generates a date in the
  dataimport.properties i.e. entityname.last_index_time.
 
  How do I reference the specific entity time in my delta queries?
 
  Thanks
 
  Eric
 



 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: dataimport handler

2013-05-07 Thread Shalin Shekhar Mangar
Using ${dih.entity_name.last_index_time} should work. Make sure you put
it in quotes in your query.


On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com wrote:

 In the  data import handler  I have multiple entities.  Each one
 generates a date in the
 dataimport.properties i.e. entityname.last_index_time.

 How do I reference the specific entity time in my delta queries?

 Thanks

 Eric




-- 
Regards,
Shalin Shekhar Mangar.


Re: Dataimport handler

2013-04-23 Thread William Bell
I also get this. 4.2+


On Fri, Apr 19, 2013 at 10:43 PM, Eric Myers badllam...@gmail.com wrote:

 I have multiple parallel entities in my document and when I run an import
 there are times like
 xxx.last_index_time
 where xxx is the name of the entity.

 I tried accessing these using dih.xxx.last_index_time but receive a null
 value.

 Is there a way to reference these in my queries.

 Thanks




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: dataimport

2013-03-31 Thread Stefan Matheis
Hey

It well never turn green since we have no explicit status for the Importer 
when it's done. But, what did you see when you hit the Refresh Button at the 
bottom of the page? are the numbers counting? 

Stefan 


On Friday, March 29, 2013 at 5:38 PM, A. Lotfi wrote:

 Hi,
 
 When I hit Execute button in Query tab I only see :
 
 Last Update: 12:34:58
 Indexing since 01s
 Requests: 1 (1/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s)
 Started: about an hour ago
 
 did not see  any green entry saying Indexing Completed.
 
  Thanks 



RE: DataImport Handler : Transformer Function Eval Failed Error

2012-11-05 Thread Mishra, Shikhar
Looks like it will be helpful. I'm going to give it a shot. Thanks, Otis.

Shikhar

From: Otis Gospodnetic [otis.gospodne...@gmail.com]
Sent: Friday, November 02, 2012 4:36 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImport Handler : Transformer Function Eval Failed Error

Would http://wiki.apache.org/solr/Join do anything for you?

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Nov 2, 2012 at 10:06 AM, Mishra, Shikhar 
shikhar.mis...@telcobuy.com wrote:

 We have a scenario where the same products are available from multiple
 vendors at different prices. We want to store these prices along with the
 products in the index (product has many prices), so that we can apply
 dynamic filtering on the prices at the time of search.


 Thanks,
 Shikhar

 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
 Sent: Thursday, November 01, 2012 8:13 PM
 To: solr-user@lucene.apache.org
 Subject: Re: DataImport Handler : Transformer Function Eval Failed Error

 Hi,

 That looks a little painful... what are you trying to achieve by storing
 JSON in there? Maybe there's a simpler way to get there...

 Otis
 --
 Performance Monitoring - http://sematext.com/spm On Nov 1, 2012 6:16 PM,
 Mishra, Shikhar shikhar.mis...@telcobuy.com
 wrote:

  Hi,
 
  I'm trying to store a list of JSON objects as stored value for the
  field prices (see below).
 
  I'm getting the following error from the custom transformer function
  (see the data-config file at the end) of data import handler.
 
  Error Message
 
  --
  - Caused by:
  org.apache.solr.handler.dataimport.DataImportHandlerException:
  'eval' failed with language: JavaScript and script:
  function vendorPrices(row){
 
  var wwtCost = row.get('WWT_COST');
  var listPrice = row.get('LIST_PRICE');
  var vendorName = row.get('VENDOR_NAME');
 
  //Below approach fails
  var prices = [];
 
  prices.push({'vendor':vendorName});
  prices.push({'wwtCost':wwtCost});
  prices.push({'listPrice':listPrice});
 
  row.put('prices':prices);
 
  //Below approach works
  //row.put('prices', '{' + 'vendor:' + vendorName +
  ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
  return row;
  } Processing Document # 1
  at
  org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT
  hrow(DataImportHandlerException.java:71)
 
  Data Import Handler Configuration File dataConfig
 
  script
  ![CDATA[
  function vendorPrices(row){
 
  var wwtCost = row.get('WWT_COST');
  var listPrice = row.get('LIST_PRICE');
  var vendorName = row.get('VENDOR_NAME');
 
  //Below approach fails
  var prices = [];
 
  prices.push({'vendor':vendorName});
  prices.push({'wwtCost':wwtCost});
  prices.push({'listPrice':listPrice});
 
  row.put('prices':prices);
 
  //Below approach works
  //row.put('prices', '{' + 'vendor:' + vendorName +
  ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
  return row;
  }
  ]]
  /script
 
  dataSource driver=oracle.jdbc.driver.OracleDriver
  url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=
  rac-scan.somr.com)(PORT=3465))(CONNECT_DATA=(SERVICE_NAME=
  ERP_GENERAL.SOMR.ORG))) user=dummy password=xx/
  document
  entity name=item query=select * from
  wwt_catalog.wwt_product prod, wwt_catalog.wwt_manufacturer mfg where
  prod.mfg_id = mfg.mfg_id and prod.mfg_product_number =
 'CON-CBO2-B22HPF'
  field column=PRODUCT_ID name=id /
  field column=MFG_PRODUCT_NUMBER name=name /
  field column=MFG_PRODUCT_NUMBER name=nameSort /
  field column=MFG_NAME name=manu /
  field column=MFG_ITEM_NUMBER name=alphaNameSort /
  field column=DESCRIPTION name=features /
  field column=DESCRIPTION name=description /
 
  entity name=vendor_sources
  transformer=script:vendorPrices query=SELECT PRICE.WWT_COST,
  PRICE.LIST_PRICE, VEND.VENDOR_NAME, AVAIL.LEAD_TIME,
  AVAIL.QTY_AVAILABLE FROM wwt_catalog.wwt_product prod,
  wwt_catalog.wwt_product_pricing price, wwt_catalog.wwt_vendor vend,
  wwt_catalog.wwt_product_availability avail WHERE  PROD.PRODUCT_ID =
  price.product_id(+) AND price.vendor_id

RE: DataImport Handler : Transformer Function Eval Failed Error

2012-11-02 Thread Mishra, Shikhar
We have a scenario where the same products are available from multiple vendors 
at different prices. We want to store these prices along with the products in 
the index (product has many prices), so that we can apply dynamic filtering on 
the prices at the time of search.


Thanks,
Shikhar

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Thursday, November 01, 2012 8:13 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImport Handler : Transformer Function Eval Failed Error

Hi,

That looks a little painful... what are you trying to achieve by storing JSON 
in there? Maybe there's a simpler way to get there...

Otis
--
Performance Monitoring - http://sematext.com/spm On Nov 1, 2012 6:16 PM, 
Mishra, Shikhar shikhar.mis...@telcobuy.com
wrote:

 Hi,

 I'm trying to store a list of JSON objects as stored value for the 
 field prices (see below).

 I'm getting the following error from the custom transformer function 
 (see the data-config file at the end) of data import handler.

 Error Message

 --
 - Caused by: 
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 'eval' failed with language: JavaScript and script:
 function vendorPrices(row){

 var wwtCost = row.get('WWT_COST');
 var listPrice = row.get('LIST_PRICE');
 var vendorName = row.get('VENDOR_NAME');

 //Below approach fails
 var prices = [];

 prices.push({'vendor':vendorName});
 prices.push({'wwtCost':wwtCost});
 prices.push({'listPrice':listPrice});

 row.put('prices':prices);

 //Below approach works
 //row.put('prices', '{' + 'vendor:' + vendorName + 
 ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
 return row;
 } Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT
 hrow(DataImportHandlerException.java:71)

 Data Import Handler Configuration File dataConfig

 script
 ![CDATA[
 function vendorPrices(row){

 var wwtCost = row.get('WWT_COST');
 var listPrice = row.get('LIST_PRICE');
 var vendorName = row.get('VENDOR_NAME');

 //Below approach fails
 var prices = [];

 prices.push({'vendor':vendorName});
 prices.push({'wwtCost':wwtCost});
 prices.push({'listPrice':listPrice});

 row.put('prices':prices);

 //Below approach works
 //row.put('prices', '{' + 'vendor:' + vendorName + 
 ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
 return row;
 }
 ]]
 /script

 dataSource driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=
 rac-scan.somr.com)(PORT=3465))(CONNECT_DATA=(SERVICE_NAME=
 ERP_GENERAL.SOMR.ORG))) user=dummy password=xx/
 document
 entity name=item query=select * from 
 wwt_catalog.wwt_product prod, wwt_catalog.wwt_manufacturer mfg where 
 prod.mfg_id = mfg.mfg_id and prod.mfg_product_number = 'CON-CBO2-B22HPF'
 field column=PRODUCT_ID name=id /
 field column=MFG_PRODUCT_NUMBER name=name /
 field column=MFG_PRODUCT_NUMBER name=nameSort /
 field column=MFG_NAME name=manu /
 field column=MFG_ITEM_NUMBER name=alphaNameSort /
 field column=DESCRIPTION name=features /
 field column=DESCRIPTION name=description /

 entity name=vendor_sources
 transformer=script:vendorPrices query=SELECT PRICE.WWT_COST, 
 PRICE.LIST_PRICE, VEND.VENDOR_NAME, AVAIL.LEAD_TIME, 
 AVAIL.QTY_AVAILABLE FROM wwt_catalog.wwt_product prod, 
 wwt_catalog.wwt_product_pricing price, wwt_catalog.wwt_vendor vend, 
 wwt_catalog.wwt_product_availability avail WHERE  PROD.PRODUCT_ID = 
 price.product_id(+) AND price.vendor_id =
 vend.vendor_id(+) AND PRICE.PRODUCT_ID = avail.product_id(+) AND 
 PRICE.VENDOR_ID = AVAIL.VENDOR_ID(+) AND prod.PRODUCT_ID = 
 '${item.PRODUCT_ID}'

 /entity
 /entity

 /document
 /dataConfig


 Are there any syntactic errors in the JavaScript code above? Thanks.

 Shikhar





Re: DataImport Handler : Transformer Function Eval Failed Error

2012-11-01 Thread Otis Gospodnetic
Hi,

That looks a little painful... what are you trying to achieve by storing
JSON in there? Maybe there's a simpler way to get there...

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 1, 2012 6:16 PM, Mishra, Shikhar shikhar.mis...@telcobuy.com
wrote:

 Hi,

 I'm trying to store a list of JSON objects as stored value for the field
 prices (see below).

 I'm getting the following error from the custom transformer function (see
 the data-config file at the end) of data import handler.

 Error Message

 ---
 Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
 'eval' failed with language: JavaScript and script:
 function vendorPrices(row){

 var wwtCost = row.get('WWT_COST');
 var listPrice = row.get('LIST_PRICE');
 var vendorName = row.get('VENDOR_NAME');

 //Below approach fails
 var prices = [];

 prices.push({'vendor':vendorName});
 prices.push({'wwtCost':wwtCost});
 prices.push({'listPrice':listPrice});

 row.put('prices':prices);

 //Below approach works
 //row.put('prices', '{' + 'vendor:' + vendorName + ',
 ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
 return row;
 } Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)

 Data Import Handler Configuration File
 dataConfig

 script
 ![CDATA[
 function vendorPrices(row){

 var wwtCost = row.get('WWT_COST');
 var listPrice = row.get('LIST_PRICE');
 var vendorName = row.get('VENDOR_NAME');

 //Below approach fails
 var prices = [];

 prices.push({'vendor':vendorName});
 prices.push({'wwtCost':wwtCost});
 prices.push({'listPrice':listPrice});

 row.put('prices':prices);

 //Below approach works
 //row.put('prices', '{' + 'vendor:' + vendorName + ',
 ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
 return row;
 }
 ]]
 /script

 dataSource driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=
 rac-scan.somr.com)(PORT=3465))(CONNECT_DATA=(SERVICE_NAME=
 ERP_GENERAL.SOMR.ORG))) user=dummy password=xx/
 document
 entity name=item query=select * from wwt_catalog.wwt_product
 prod, wwt_catalog.wwt_manufacturer mfg where prod.mfg_id = mfg.mfg_id and
 prod.mfg_product_number = 'CON-CBO2-B22HPF'
 field column=PRODUCT_ID name=id /
 field column=MFG_PRODUCT_NUMBER name=name /
 field column=MFG_PRODUCT_NUMBER name=nameSort /
 field column=MFG_NAME name=manu /
 field column=MFG_ITEM_NUMBER name=alphaNameSort /
 field column=DESCRIPTION name=features /
 field column=DESCRIPTION name=description /

 entity name=vendor_sources
 transformer=script:vendorPrices query=SELECT PRICE.WWT_COST,
 PRICE.LIST_PRICE, VEND.VENDOR_NAME, AVAIL.LEAD_TIME, AVAIL.QTY_AVAILABLE
 FROM wwt_catalog.wwt_product prod, wwt_catalog.wwt_product_pricing price,
 wwt_catalog.wwt_vendor vend, wwt_catalog.wwt_product_availability avail
 WHERE  PROD.PRODUCT_ID = price.product_id(+) AND price.vendor_id =
 vend.vendor_id(+) AND PRICE.PRODUCT_ID = avail.product_id(+) AND
 PRICE.VENDOR_ID = AVAIL.VENDOR_ID(+) AND prod.PRODUCT_ID =
 '${item.PRODUCT_ID}'

 /entity
 /entity

 /document
 /dataConfig


 Are there any syntactic errors in the JavaScript code above? Thanks.

 Shikhar





RE: Dataimport Handler in solr 3.6.1

2012-08-30 Thread Dyer, James
There were 2 major changes to DIH Cache functionality in Solr 3.6, only 1 of 
which was carried to Solr 4.0:

- Solr 3.6 had 2 MAJOR changes:

1. We support pluggable caches so that you can write your own cache 
implemetations and cache however you want.  The goal here is to allow you to 
cache to disk when you had to do large, complex joins and an in-memory cache 
could result in an OOM.  Also, you can specify cacheImpl with any 
EntityProcessor, not just SqlEntityProcessor.  So you can join child entities 
that come from XML, flat files, etc.  CachedSqlEntityProcessor is technically 
deprecated as using it is the same as SqlEntityProcessor with 
cacheImpl=SortedMapBackedCache specified.  This does a simple in-memory cache 
very similar to Solr3.5 and prior. (see 
https://issues.apache.org/jira/browse/SOLR-2382)

2. Extensive work was done to try and make the threads parameter work in more 
situations.  This involved some rather invasive changes to the DIH Cache 
functionality. (see https://issues.apache.org/jira/browse/SOLR-3011)

- Solr 4.0 has #1 above, BUT NOT #2.  Rather the threads functionality was 
entirely removed.

Subsequently, if the problem is due to #2 (SOLR-3011), this isn't as big a 
problem because 3.x users can simply use the 3.5 DIH jar (but some use-cases 
involding threads work with the 3.6(.1) jar and not at all with 3.5, so users 
will have to pick  choose the best version to use for their instance).

My concern is there are issues with #1 (SOLR-2382).  That's why I'm asking if 
at all possible you can try this with SOLR 4.0.  I have tested Solr 4.0 
extensively here and it seems caching works exactly as it ought.  However, DIH 
is flexible on how it can be configured and there could be somethat that was 
broken that I have not uncovered myself.  Any issues that may exist with 
SOLR-2382 need to be identified and fixed in the 4.x branch as soon as possible.

I apologize for the late response.  I was away the past week.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: mechravi25 [mailto:mechrav...@yahoo.co.in] 
Sent: Tuesday, August 21, 2012 7:47 AM
To: solr-user@lucene.apache.org
Subject: RE: Dataimport Handler in solr 3.6.1

Hi James,

Thanks for the suggestions. 

Actually it is cacheLookup=ent1.id . had misspelt it. Also, I will be
needing the transformers mentioned as there are other columns as well.

Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the
indexing was successful. But I wanted this to work with 3.6.1 DIH. Just came
across the SOLR-2382 patch. I tried giving the following 

processor=CachedSqlEntityProcessor cacheImpl=SortedMapBackedCache 

in my DIH.xml file. In case of static fields in child entities ,the indexing
happended fine but in case of dynamic fields, only one of the dynamic fields
was indexed and the rest was skipped even though the total rows fetched from
datasource was correct.

Following are my questions

1.) Is there a big difference in solr 3.5 and 3.6.1 DIH handler files? like
is any new feature added in 3.6 DIH that is not present in 3.5?
2.) Am i missing something while giving the cacheImpl=SortedMapBackedCache
in my DIH.xml because of which dynamic fields are not indexed properly?
There is no change to my DIH file from my previous post apart from this
cacheImpl addition and also the dynamic fields are indexed properly if I do
not give this cacheImpl. Am I missing something here?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149p4002421.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Dataimport Handler in solr 3.6.1

2012-08-21 Thread mechravi25
Hi James,

Thanks for the suggestions. 

Actually it is cacheLookup=ent1.id . had misspelt it. Also, I will be
needing the transformers mentioned as there are other columns as well.

Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the
indexing was successful. But I wanted this to work with 3.6.1 DIH. Just came
across the SOLR-2382 patch. I tried giving the following 

processor=CachedSqlEntityProcessor cacheImpl=SortedMapBackedCache 

in my DIH.xml file. In case of static fields in child entities ,the indexing
happended fine but in case of dynamic fields, only one of the dynamic fields
was indexed and the rest was skipped even though the total rows fetched from
datasource was correct.

Following are my questions

1.) Is there a big difference in solr 3.5 and 3.6.1 DIH handler files? like
is any new feature added in 3.6 DIH that is not present in 3.5?
2.) Am i missing something while giving the cacheImpl=SortedMapBackedCache
in my DIH.xml because of which dynamic fields are not indexed properly?
There is no change to my DIH file from my previous post apart from this
cacheImpl addition and also the dynamic fields are indexed properly if I do
not give this cacheImpl. Am I missing something here?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149p4002421.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Dataimport Handler in solr 3.6.1

2012-08-14 Thread Dyer, James
One thing I notice in your configuration...the child entity has this:

cacheLookup=ent1.uid

but your parent entity doesn't have a uid field.  

Also, you have these 3 transformers:  
RegexTransformer,DateFormatTransformer,TemplateTransformer

but none of your columns seem to make use of these.  Are you sure you need them?

In any case I am suspicious there may still be bugs in 3.6.1 related to 
CachedSqlEntityProcessor, so if you are able to create a failing unit test and 
post it to JIRA that would be helpful.  If you need to, you can use the 3.5 DIH 
jar with Solr 3.6.1.  Also, I do not think the SOLR-3360 should affect you 
unless you're using the threads parameter.  Both SOLR-3360  SOLR-3430 fixed 
bugs related to CachedSqlEntityProcessor that were introduced in 3.6.0 (from 
SOLR-3411 and SOLR-2482 respectively).

Finally, if you are at all able to test this on 4.0-beta, I would greatly 
appreciate it!  SOLR-3411/SOLR-3360 were never applied to version 4.0 because 
threadS support was removed entirely.  However, SOLR-2482/SOLR-3430 were 
applied to 4.0 also.  If we have any more SOLR-2482 bugs lingering in 4.0 these 
really need to be fixed so any testing help would be much appreciated.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: mechravi25 [mailto:mechrav...@yahoo.co.in] 
Sent: Tuesday, August 14, 2012 8:04 AM
To: solr-user@lucene.apache.org
Subject: Dataimport Handler in solr 3.6.1

I am indexing some data using dataimport handler files in solr 3.6.1. I using
a nested entity in my handler file. 
I noticed a scenario where-in instead of the records which is to be fetched
for a document, 
all the records present in the table are indexed.

Following is the ideal scenario how the data has to be indexed.
For a document A, I am trying to index the 2 values B,C as a multivalued
field

idA/id
related_id
strB/str
strC/str
/related_id

This is how the output should be. I have used the same DIH file for solr
1.4,3.5 versions 
and the data was indexed fine like the one mentioned above in both the
versions.

But in solr 3.6.1 version, data was indexed differently. In my table, there
are 4 values(B,C,D,E) in related_id field.
This is how the data is indexed in 3.6.1

idA/id
related_id
strB/str
strC/str
strD/str
strE/str
/related_id

Ideally, the values D and E should not get indexed under id A. This is the
same for the other id records.


Following is the content of the DIH file



 entity name=ent1  query=select sid as id Table1 a 
transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer

field column=id name=id boost=0.5/
  

entity name=ent2 query=select id1,rid from Table2 
processor=CachedSqlEntityProcessor cacheKey=id1 cacheLookup=ent1.uid
transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer


field column=rid name=related_id/
   

/entity


/entity



 I tried changing the CachedSqlEntityProcessor to SqlEntityProcessor and
then indexed the same but still I faced the same issue.
 
 When I googled a bit, I found this url
https://issues.apache.org/jira/browse/SOLR-3360


I am not sure if the issue 3360 is the same as the scenario as I have
mentioned above.

Please guid me.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: DataImport using last_indexed_id or getting max(id) quickly

2012-07-13 Thread Erick Erickson
You could also just keep a special document in your index with a known
ID that contains meta-data fields. If this document had no fields in common
with any other document it wouldn't satisfy searches (except the *:* search).

Or you could store this info somewhere else (file, DB, etc).

Or you can commit with user data, although this isn't exposed
through Solr yet, see:
https://issues.apache.org/jira/browse/SOLR-2701

Best
Erick

On Thu, Jul 12, 2012 at 5:22 AM,  karsten-s...@gmx.de wrote:
 Hi Avenka,

 you asked for a HowTo to add a field inverseID which allows to calculate 
 max(id) from its first term:
 If you do not use solr you have to calculate 1 - id and store it in 
 an extra field inverseID.
 If you fill solr with your own code, add a TrieLongField inverseID and fill 
 with the value -id.
 If you only want to change schema.xml (and add some classes):
   * You need a new FieldType inverseLongType and a Field inverseID of 
 Type inverseLongType
   * You need a line copyField source=id dest=inverseID/
(see http://wiki.apache.org/solr/SchemaXml#Copy_Fields)

 For inverseLongType I see two possibilities
  a) use TextField and make your own filter to calculate 1 - id
  b) extends TrieLongField to a new FieldType InverseTrieLongField with:
   @Override
   public String readableToIndexed(String val) {
 return super.readableToIndexed(Long.toString( -Long.parseLong(val)));
   }
   @Override
   public Fieldable createField(SchemaField field, String externalVal, float 
 boost) {
 return super.createField(field,Long.toString( -Long.parseLong(val)), 
 boost );
   }
   @Override
   public Object toObject(Fieldable f) {
 Object result = super.toObject(f);
 if(result instanceof Long){
   return new Long( -((Long)result).longValue());
 }
 return result;
   }

 Beste regards
Karsten

 View this message in context:
 http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html


  Original-Nachricht 
 Datum: Wed, 11 Jul 2012 20:59:10 -0700 (PDT)
 Von: avenka ave...@gmail.com
 An: solr-user@lucene.apache.org
 Betreff: Re: DataImport using last_indexed_id or getting max(id) quickly

 Thanks. Can you explain more the first TermsComponent option to obtain
 max(id)? Do I have to modify schema.xml to add a new field? How exactly do
 I
 query for the lowest value of 1 - id?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImport using last_indexed_id or getting max(id) quickly

2012-07-12 Thread karsten-solr
Hi Avenka,

you asked for a HowTo to add a field inverseID which allows to calculate 
max(id) from its first term:
If you do not use solr you have to calculate 1 - id and store it in 
an extra field inverseID.
If you fill solr with your own code, add a TrieLongField inverseID and fill 
with the value -id.
If you only want to change schema.xml (and add some classes):
  * You need a new FieldType inverseLongType and a Field inverseID of Type 
inverseLongType
  * You need a line copyField source=id dest=inverseID/
   (see http://wiki.apache.org/solr/SchemaXml#Copy_Fields)

For inverseLongType I see two possibilities
 a) use TextField and make your own filter to calculate 1 - id
 b) extends TrieLongField to a new FieldType InverseTrieLongField with:
  @Override
  public String readableToIndexed(String val) {
return super.readableToIndexed(Long.toString( -Long.parseLong(val)));
  }
  @Override
  public Fieldable createField(SchemaField field, String externalVal, float 
boost) {
return super.createField(field,Long.toString( -Long.parseLong(val)), boost 
);
  }
  @Override
  public Object toObject(Fieldable f) {
Object result = super.toObject(f);
if(result instanceof Long){
  return new Long( -((Long)result).longValue());
}
return result;
  }

Beste regards
   Karsten

View this message in context:
http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html


 Original-Nachricht 
 Datum: Wed, 11 Jul 2012 20:59:10 -0700 (PDT)
 Von: avenka ave...@gmail.com
 An: solr-user@lucene.apache.org
 Betreff: Re: DataImport using last_indexed_id or getting max(id) quickly

 Thanks. Can you explain more the first TermsComponent option to obtain
 max(id)? Do I have to modify schema.xml to add a new field? How exactly do
 I
 query for the lowest value of 1 - id?
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImport using last_indexed_id or getting max(id) quickly

2012-07-11 Thread avenka
Thanks. Can you explain more the first TermsComponent option to obtain
max(id)? Do I have to modify schema.xml to add a new field? How exactly do I
query for the lowest value of 1 - id?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763p3994560.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimport handler (DIH) - notify when it has finished?

2012-05-01 Thread Gora Mohanty
On 1 May 2012 23:12, geeky2 gee...@hotmail.com wrote:
 Hello all,

 is there a notification / trigger / callback mechanism people use that
 allows them to know when a dataimport process has finished?

 we will be doing daily delta-imports and i need some way for an operations
 group to know when the DIH has finished.


Never tried it myself, but this should meet your needs:
http://wiki.apache.org/solr/DataImportHandler#EventListeners

Regards,
Gora


Re: dataimport indexing fails: where are my log files ? ;-)

2011-10-19 Thread Shawn Heisey

On 10/19/2011 12:42 PM, Fred Zimmerman wrote:

dumb question ...

today I set up solr3.4/example, indexing to 8983 via post is working, so is
search, solr/dataimport reports

str name=Total Rows Fetched0/str
str name=Total Documents Processed0/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2011-10-19 18:13:57/str
str name=Indexing failed. Rolled back all changes./str

Google tells me to look at the exception logs to find out what's happening
... but, I can't find the logs!   Where are they? example/logs is an empty
directory.


I believe that if you are running the example Solr without any changes 
related to logging, that information will be dumped to stdout/stderr.  
If you are starting Solr as a daemon or a service, it may be going 
someplace you can't retrieve it.  Start it directly from the commandline 
and/or alter your startup command to redirect stdout/stderr to files.


I hope that's actually helpful!

Thanks,
Shawn



Re: dataimport

2011-03-09 Thread Brian Lamb
This has since been fixed. The problem was that there was not enough memory
on the machine. It works just fine now.

On Tue, Mar 8, 2011 at 6:22 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : INFO: Creating a connection for entity id with URL:
 :
 jdbc:mysql://localhost/researchsquare_beta_library?characterEncoding=UTF8zeroDateTimeBehavior=convertToNull
 : Feb 24, 2011 8:58:25 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1
 : call
 : INFO: Time taken for getConnection(): 137
 : Killed
 :
 : So it looks like for whatever reason, the server crashes trying to do a
 full
 : import. When I add a LIMIT clause on the query, it works fine when the
 LIMIT
 : is only 250 records but if I try to do 500 records, I get the same
 message.

 ...wow.  that's ... weird.

 I've never seen a java process just log Killed like that.

 The only time i've ever seen a process log Killed is if it was
 terminated by the os (ie: kill -9 pid)

 What OS are you using? how are you running solr? (ie: are you using the
 simple jetty example java -jar start.jar or are you using a differnet
 servlet container?) ... are you absolutely certain your machine doens't
 have some sort of monitoring in place that kills jobs if they take too
 long, or use too much CPU?


 -Hoss



Re: dataimport

2011-03-09 Thread Adam Estrada
Brian,

I had the same problem a while back and set the JAVA_OPTS env variable
to something my machine could handle. That may also be an option for
you going forward.

Adam

On Wed, Mar 9, 2011 at 9:33 AM, Brian Lamb
brian.l...@journalexperts.com wrote:
 This has since been fixed. The problem was that there was not enough memory
 on the machine. It works just fine now.

 On Tue, Mar 8, 2011 at 6:22 PM, Chris Hostetter 
 hossman_luc...@fucit.orgwrote:


 : INFO: Creating a connection for entity id with URL:
 :
 jdbc:mysql://localhost/researchsquare_beta_library?characterEncoding=UTF8zeroDateTimeBehavior=convertToNull
 : Feb 24, 2011 8:58:25 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1
 : call
 : INFO: Time taken for getConnection(): 137
 : Killed
 :
 : So it looks like for whatever reason, the server crashes trying to do a
 full
 : import. When I add a LIMIT clause on the query, it works fine when the
 LIMIT
 : is only 250 records but if I try to do 500 records, I get the same
 message.

 ...wow.  that's ... weird.

 I've never seen a java process just log Killed like that.

 The only time i've ever seen a process log Killed is if it was
 terminated by the os (ie: kill -9 pid)

 What OS are you using? how are you running solr? (ie: are you using the
 simple jetty example java -jar start.jar or are you using a differnet
 servlet container?) ... are you absolutely certain your machine doens't
 have some sort of monitoring in place that kills jobs if they take too
 long, or use too much CPU?


 -Hoss




Re: dataimport

2011-03-08 Thread Chris Hostetter

: INFO: Creating a connection for entity id with URL:
: 
jdbc:mysql://localhost/researchsquare_beta_library?characterEncoding=UTF8zeroDateTimeBehavior=convertToNull
: Feb 24, 2011 8:58:25 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
: call
: INFO: Time taken for getConnection(): 137
: Killed
: 
: So it looks like for whatever reason, the server crashes trying to do a full
: import. When I add a LIMIT clause on the query, it works fine when the LIMIT
: is only 250 records but if I try to do 500 records, I get the same message.

...wow.  that's ... weird.

I've never seen a java process just log Killed like that.

The only time i've ever seen a process log Killed is if it was 
terminated by the os (ie: kill -9 pid)

What OS are you using? how are you running solr? (ie: are you using the 
simple jetty example java -jar start.jar or are you using a differnet 
servlet container?) ... are you absolutely certain your machine doens't 
have some sort of monitoring in place that kills jobs if they take too 
long, or use too much CPU?


-Hoss


Re: Dataimport performance

2010-12-19 Thread Alexey Serba
 With subquery and with left join:   320k in 6 Min 30
It's 820 records per second. It's _really_ impressive considering the
fact that DIH performs separate sql query for every record in your
case.

 So there's one track entity with an artist sub-entity. My (admittedly
 rather limited) experience has been that sub-entities, where you have
 to run a separate query for every row in the parent entity, really
 slow down data import.
Sub entities slows down data import indeed. You can try to avoid
separate query for every row by using CachedSqlEntityProcessor. There
are couple of options - 1) you can load all sub-entity data in memory
or 2) you can reduce the number of sql queries by caching sub entity
data per id. There's no silver bullet and each option has its own pros
and cons.

Also Ephraim proposed a really neat solution with GROUP_CONCAT, but
I'm not sure that all RDBMS-es support that.


2010/12/15 Robert Gründler rob...@dubture.com:
 i've benchmarked the import already with 500k records, one time without the 
 artists subquery, and one time without the join in the main query:


 Without subquery: 500k in 3 min 30 sec

 Without join and without subquery: 500k in 2 min 30.

 With subquery and with left join:   320k in 6 Min 30


 so the joins / subqueries are definitely a bottleneck.

 How exactly did you implement the custom data import?

 In our case, we need to de-normalize the relations of the sql data for the 
 index,
 so i fear i can't really get rid of the join / subquery.


 -robert





 On Dec 15, 2010, at 15:43 , Tim Heckman wrote:

 2010/12/15 Robert Gründler rob...@dubture.com:
 The data-config.xml looks like this (only 1 entity):

      entity name=track query=select t.id as id, t.title as title, 
 l.title as label from track t left join label l on (l.id = t.label_id) 
 where t.deleted = 0 transformer=TemplateTransformer
        field column=title name=title_t /
        field column=label name=label_t /
        field column=id name=sf_meta_id /
        field column=metaclass template=Track name=sf_meta_class/
        field column=metaid template=${track.id} name=sf_meta_id/
        field column=uniqueid template=Track_${track.id} 
 name=sf_unique_id/

        entity name=artists query=select a.name as artist from artist a 
 left join track_artist ta on (ta.artist_id = a.id) where 
 ta.track_id=${track.id}
          field column=artist name=artists_t /
        /entity

      /entity

 So there's one track entity with an artist sub-entity. My (admittedly
 rather limited) experience has been that sub-entities, where you have
 to run a separate query for every row in the parent entity, really
 slow down data import. For my own purposes, I wrote a custom data
 import using SolrJ to improve the performance (from 3 hours to 10
 minutes).

 Just as a test, how long does it take if you comment out the artists entity?




Re: Dataimport performance

2010-12-19 Thread Lukas Kahwe Smith

On 19.12.2010, at 23:30, Alexey Serba wrote:

 
 Also Ephraim proposed a really neat solution with GROUP_CONCAT, but
 I'm not sure that all RDBMS-es support that.


Thats MySQL only syntax.
But if you google you can find similar solution for other RDBMS.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





RE: Dataimport performance

2010-12-16 Thread Ephraim Ofir
Check out 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3e
This approach of not using sub entities really improved our load time.

Ephraim Ofir

-Original Message-
From: Robert Gründler [mailto:rob...@dubture.com] 
Sent: Wednesday, December 15, 2010 4:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Dataimport performance

i've benchmarked the import already with 500k records, one time without the 
artists subquery, and one time without the join in the main query:


Without subquery: 500k in 3 min 30 sec

Without join and without subquery: 500k in 2 min 30.

With subquery and with left join:   320k in 6 Min 30


so the joins / subqueries are definitely a bottleneck. 

How exactly did you implement the custom data import? 

In our case, we need to de-normalize the relations of the sql data for the 
index, 
so i fear i can't really get rid of the join / subquery.


-robert





On Dec 15, 2010, at 15:43 , Tim Heckman wrote:

 2010/12/15 Robert Gründler rob...@dubture.com:
 The data-config.xml looks like this (only 1 entity):
 
  entity name=track query=select t.id as id, t.title as title, 
 l.title as label from track t left join label l on (l.id = t.label_id) where 
 t.deleted = 0 transformer=TemplateTransformer
field column=title name=title_t /
field column=label name=label_t /
field column=id name=sf_meta_id /
field column=metaclass template=Track name=sf_meta_class/
field column=metaid template=${track.id} name=sf_meta_id/
field column=uniqueid template=Track_${track.id} 
 name=sf_unique_id/
 
entity name=artists query=select a.name as artist from artist a 
 left join track_artist ta on (ta.artist_id = a.id) where 
 ta.track_id=${track.id}
  field column=artist name=artists_t /
/entity
 
  /entity
 
 So there's one track entity with an artist sub-entity. My (admittedly
 rather limited) experience has been that sub-entities, where you have
 to run a separate query for every row in the parent entity, really
 slow down data import. For my own purposes, I wrote a custom data
 import using SolrJ to improve the performance (from 3 hours to 10
 minutes).
 
 Just as a test, how long does it take if you comment out the artists entity?



RE: Dataimport performance

2010-12-16 Thread Dyer, James
We have ~50 long-running SQL queries that need to be joined and denormalized.  
Not all of the queries are to the same db, and some data comes from fixed-width 
data feeds.  Our current search engine (that we are converting to SOLR) has a 
fast disk-caching mechanism that lets you cache all of these data sources and 
then it will join them locally prior to indexing.  

I'm in the process of developing something similar for DIH that uses the 
Berkley db to do the same thing.  Its good enough that I can do nightly full 
re-indexes of all our data while developing the front-end, but it is still very 
rough.  Possibly I would like to get this refined enough to eventually submit 
as a jira ticket / patch as it seems this is a somewhat common problem that 
needs solving.

Even with our current search engine, the join  denormalize step is always the 
longest-running part of the process.  However, I have it running fairly fast by 
partitioning the data by a modulus of the primary key and then running several 
jobs in parallel.  The trick is not to get I/O bound.  Things run fast if you 
can set it up to maximize CPU.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ephraim Ofir [mailto:ephra...@icq.com] 
Sent: Thursday, December 16, 2010 3:04 AM
To: solr-user@lucene.apache.org
Subject: RE: Dataimport performance

Check out 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3e
This approach of not using sub entities really improved our load time.

Ephraim Ofir

-Original Message-
From: Robert Gründler [mailto:rob...@dubture.com] 
Sent: Wednesday, December 15, 2010 4:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Dataimport performance

i've benchmarked the import already with 500k records, one time without the 
artists subquery, and one time without the join in the main query:


Without subquery: 500k in 3 min 30 sec

Without join and without subquery: 500k in 2 min 30.

With subquery and with left join:   320k in 6 Min 30


so the joins / subqueries are definitely a bottleneck. 

How exactly did you implement the custom data import? 

In our case, we need to de-normalize the relations of the sql data for the 
index, 
so i fear i can't really get rid of the join / subquery.


-robert





On Dec 15, 2010, at 15:43 , Tim Heckman wrote:

 2010/12/15 Robert Gründler rob...@dubture.com:
 The data-config.xml looks like this (only 1 entity):
 
  entity name=track query=select t.id as id, t.title as title, 
 l.title as label from track t left join label l on (l.id = t.label_id) where 
 t.deleted = 0 transformer=TemplateTransformer
field column=title name=title_t /
field column=label name=label_t /
field column=id name=sf_meta_id /
field column=metaclass template=Track name=sf_meta_class/
field column=metaid template=${track.id} name=sf_meta_id/
field column=uniqueid template=Track_${track.id} 
 name=sf_unique_id/
 
entity name=artists query=select a.name as artist from artist a 
 left join track_artist ta on (ta.artist_id = a.id) where 
 ta.track_id=${track.id}
  field column=artist name=artists_t /
/entity
 
  /entity
 
 So there's one track entity with an artist sub-entity. My (admittedly
 rather limited) experience has been that sub-entities, where you have
 to run a separate query for every row in the parent entity, really
 slow down data import. For my own purposes, I wrote a custom data
 import using SolrJ to improve the performance (from 3 hours to 10
 minutes).
 
 Just as a test, how long does it take if you comment out the artists entity?



Re: Dataimport performance

2010-12-16 Thread Glen Newton
Hi,

LuSqlv2 beta comes out in the next few weeks, and is designed to
address this issue (among others).

LuSql original 
(http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
now moved to: https://code.google.com/p/lusql/) is a JDBC--Lucene
high performance loader.

You may have seen my posts on this list suggesting LuSql as high
performance alternative to DIH, for a subset of use cases.

LuSqlV2 has evolved into a full extract-transform-load (ETL) high
performance engine, focusing on many of the issues of interest to the
Lucene/SOLR community.
It has a pipelined, pluggable, multithreaded architecture.
It is basically: pluggable source -- 0 or more pluggable filters --
pluggable sink

Source plugins implemented:
- JDBC, Lucene, SOLR (SolrJ), BDB, CSV, RMI, Java Serialization
Sink plugins implemented:
- JDBC, Lucene, SOLR (SolrJ), BDB, XML, RMI, Java Serialization, Tee,
NullSink [I am working on a memcached Sink]
A number of different filters implemented (i.e. get PDF file from
filesystem based on SQL field and convert  get test, etc) including:
BDBJoinFIlter, JDBCJoinFilter

--

This particular problem is one of the unit tests I have: given a
simple database of:
1- table Name
2- table City
3- table nameCityJoin
4- table Job
5- table nameJobJoin

run a JDBC--BDB LuSql instance each for of City+nameCityJoin and
Job+nameJobJoin; then run a JDBC--SolrJ on table Name, adding 2
BDBJoinFIlters, each which take the BDB generated earlier and do the
join (you just tell the filters which field from the JDBC-generated to
use against the BDB key).

So your use case use a larger example of this.

Also of interest:
- Java RMI (Remote Method Invocation): both an RMISink(Server) and
RMISource(Client) are implemented. This means you can set up N
machines which are doing something, and have one or more clients (on
their own machines) that are pulling this data and doing something
with it. For example, JDBC--PDFToTextFilter--RMI (converting PDF
files to text based on the contents of a SQL database, with text files
in the file system): basically doing some heavy lifting, and then
start up an RMI--SolrJ (or Lucene) which is a client to the N PDF
converting machines, doing only the Lucene/SOLR indexing. The client
does a pull when it needs more data. You can have N servers x M
clients! Oh, string fields length  1024 are automatically gzipped by
the RMI Sink(Server), to reduce network (at the cost of cpu:
selectable). I am looking into RMI alternatives, like Thrift, ProtoBuf
for my next Source/Sinks to implement. Another example is the reverse
use case: when the indexing is more expensive getting the data.
Example: One JDBC--RMISink(Server) instance, N
RMISource(Client)--Lucene instances; this allows multiple Lucenes to
be fed from a single JDBC source, across machines.

- TeeSink: the Tee sink hides N sinks, so you can split the pipeline
into multiple Sinks. I've used it to send the same content to Lucene
as well as BDB in one fell swoop. Can you say index and content store
in one step?

I am working on cleaning up the code, writing docs (I made the mistake
of making great docs for LusqlV1, so I have work to do...!), and
making a couple more tests.

I will announce the beta on this and the Lucene list.

If you have any questions, please contact me.

Thanks,
Glen Newton
http://zzzoot.blogspot.com

-- Old LuSql benchmarks:
http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html

On Thu, Dec 16, 2010 at 12:04 PM, Dyer, James james.d...@ingrambook.com wrote:
 We have ~50 long-running SQL queries that need to be joined and denormalized. 
  Not all of the queries are to the same db, and some data comes from 
 fixed-width data feeds.  Our current search engine (that we are converting to 
 SOLR) has a fast disk-caching mechanism that lets you cache all of these data 
 sources and then it will join them locally prior to indexing.

 I'm in the process of developing something similar for DIH that uses the 
 Berkley db to do the same thing.  Its good enough that I can do nightly full 
 re-indexes of all our data while developing the front-end, but it is still 
 very rough.  Possibly I would like to get this refined enough to eventually 
 submit as a jira ticket / patch as it seems this is a somewhat common problem 
 that needs solving.

 Even with our current search engine, the join  denormalize step is always 
 the longest-running part of the process.  However, I have it running fairly 
 fast by partitioning the data by a modulus of the primary key and then 
 running several jobs in parallel.  The trick is not to get I/O bound.  Things 
 run fast if you can set it up to maximize CPU.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Ephraim Ofir [mailto:ephra...@icq.com]
 Sent: Thursday, December 16, 2010 3:04 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Dataimport performance

 Check out 
 http://mail-archives.apache.org/mod_mbox/lucene

Re: Dataimport performance

2010-12-15 Thread Adam Estrada
What version of Solr are you using?

Adam

2010/12/15 Robert Gründler rob...@dubture.com

 Hi,

 we're looking for some comparison-benchmarks for importing large tables
 from a mysql database (full import).

 Currently, a full-import of ~ 8 Million rows from a MySQL database takes
 around 3 hours, on a QuadCore Machine with 16 GB of
 ram and a Raid 10 storage setup. Solr is running on a apache tomcat
 instance, where it is the only app. The tomcat instance
 has the following memory-related java_opts:

 -Xms4096M -Xmx5120M


 The data-config.xml looks like this (only 1 entity):

  entity name=track query=select t.id as id, t.title as title,
 l.title as label from track t left join label l on (l.id = t.label_id)
 where t.deleted = 0 transformer=TemplateTransformer
field column=title name=title_t /
field column=label name=label_t /
field column=id name=sf_meta_id /
field column=metaclass template=Track name=sf_meta_class/
field column=metaid template=${track.id} name=sf_meta_id/
field column=uniqueid template=Track_${track.id}
 name=sf_unique_id/

entity name=artists query=select a.name as artist from artist a
 left join track_artist ta on (ta.artist_id = a.id) where ta.track_id=${
 track.id}
  field column=artist name=artists_t /
/entity

  /entity


 We have the feeling that 3 hours for this import is quite long - regarding
 the performance of the server running solr/mysql.

 Are we wrong with that assumption, or do people experience similar import
 times with this amount of data to be imported?


 thanks!


 -robert






Re: Dataimport performance

2010-12-15 Thread Erick Erickson
You're adding on the order of 750 rows (docs)/second, which isn't bad...

have you profiled the machine as this runs? Even just with top (assuming
unix)...
because the very first question is always what takes the time, getting
the data from MySQL or indexing or I/O?.

If you aren't maxing out your CPU, then you probably want to explore the
other
questions (db query speed, network latency) to get a sense whether you're
going as fast as you can or not...

Best
Erick

2010/12/15 Robert Gründler rob...@dubture.com

 Hi,

 we're looking for some comparison-benchmarks for importing large tables
 from a mysql database (full import).

 Currently, a full-import of ~ 8 Million rows from a MySQL database takes
 around 3 hours, on a QuadCore Machine with 16 GB of
 ram and a Raid 10 storage setup. Solr is running on a apache tomcat
 instance, where it is the only app. The tomcat instance
 has the following memory-related java_opts:

 -Xms4096M -Xmx5120M


 The data-config.xml looks like this (only 1 entity):

  entity name=track query=select t.id as id, t.title as title,
 l.title as label from track t left join label l on (l.id = t.label_id)
 where t.deleted = 0 transformer=TemplateTransformer
field column=title name=title_t /
field column=label name=label_t /
field column=id name=sf_meta_id /
field column=metaclass template=Track name=sf_meta_class/
field column=metaid template=${track.id} name=sf_meta_id/
field column=uniqueid template=Track_${track.id}
 name=sf_unique_id/

entity name=artists query=select a.name as artist from artist a
 left join track_artist ta on (ta.artist_id = a.id) where ta.track_id=${
 track.id}
  field column=artist name=artists_t /
/entity

  /entity


 We have the feeling that 3 hours for this import is quite long - regarding
 the performance of the server running solr/mysql.

 Are we wrong with that assumption, or do people experience similar import
 times with this amount of data to be imported?


 thanks!


 -robert






Re: Dataimport performance

2010-12-15 Thread Robert Gründler
 What version of Solr are you using?


Solr Specification Version: 1.4.1
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
Lucene Specification Version: 2.9.3
Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55


-robert



 
 Adam
 
 2010/12/15 Robert Gründler rob...@dubture.com
 
 Hi,
 
 we're looking for some comparison-benchmarks for importing large tables
 from a mysql database (full import).
 
 Currently, a full-import of ~ 8 Million rows from a MySQL database takes
 around 3 hours, on a QuadCore Machine with 16 GB of
 ram and a Raid 10 storage setup. Solr is running on a apache tomcat
 instance, where it is the only app. The tomcat instance
 has the following memory-related java_opts:
 
 -Xms4096M -Xmx5120M
 
 
 The data-config.xml looks like this (only 1 entity):
 
 entity name=track query=select t.id as id, t.title as title,
 l.title as label from track t left join label l on (l.id = t.label_id)
 where t.deleted = 0 transformer=TemplateTransformer
   field column=title name=title_t /
   field column=label name=label_t /
   field column=id name=sf_meta_id /
   field column=metaclass template=Track name=sf_meta_class/
   field column=metaid template=${track.id} name=sf_meta_id/
   field column=uniqueid template=Track_${track.id}
 name=sf_unique_id/
 
   entity name=artists query=select a.name as artist from artist a
 left join track_artist ta on (ta.artist_id = a.id) where ta.track_id=${
 track.id}
 field column=artist name=artists_t /
   /entity
 
 /entity
 
 
 We have the feeling that 3 hours for this import is quite long - regarding
 the performance of the server running solr/mysql.
 
 Are we wrong with that assumption, or do people experience similar import
 times with this amount of data to be imported?
 
 
 thanks!
 
 
 -robert
 
 
 
 



Re: Dataimport performance

2010-12-15 Thread Bernd Fehling
We are currently running Solr 4.x from trunk.

-d64 -Xms10240M -Xmx10240M

Total Rows Fetched: 24935988
Total Documents Skipped: 0
Total Documents Processed: 24568997
Time Taken: 5:55:19.104

24.5 Million Docs as XML from filesystem with less than 6 hours.

May be your MySQL is the bottleneck?

Regards
Bernd


Am 15.12.2010 14:40, schrieb Robert Gründler:
 Hi,
 
 we're looking for some comparison-benchmarks for importing large tables from 
 a mysql database (full import).
 
 Currently, a full-import of ~ 8 Million rows from a MySQL database takes 
 around 3 hours, on a QuadCore Machine with 16 GB of
 ram and a Raid 10 storage setup. Solr is running on a apache tomcat instance, 
 where it is the only app. The tomcat instance
 has the following memory-related java_opts:
 
 -Xms4096M -Xmx5120M
 
 
 The data-config.xml looks like this (only 1 entity):
 
   entity name=track query=select t.id as id, t.title as title, 
 l.title as label from track t left join label l on (l.id = t.label_id) where 
 t.deleted = 0 transformer=TemplateTransformer
 field column=title name=title_t /
 field column=label name=label_t /
 field column=id name=sf_meta_id /
 field column=metaclass template=Track name=sf_meta_class/
 field column=metaid template=${track.id} name=sf_meta_id/
 field column=uniqueid template=Track_${track.id} 
 name=sf_unique_id/
 
 entity name=artists query=select a.name as artist from artist a 
 left join track_artist ta on (ta.artist_id = a.id) where 
 ta.track_id=${track.id}
   field column=artist name=artists_t /
 /entity
 
   /entity
 
 
 We have the feeling that 3 hours for this import is quite long - regarding 
 the performance of the server running solr/mysql. 
 
 Are we wrong with that assumption, or do people experience similar import 
 times with this amount of data to be imported?
 
 
 thanks!
 
 
 -robert
 
 
 

-- 
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Dataimport performance

2010-12-15 Thread Tim Heckman
2010/12/15 Robert Gründler rob...@dubture.com:
 The data-config.xml looks like this (only 1 entity):

      entity name=track query=select t.id as id, t.title as title, l.title 
 as label from track t left join label l on (l.id = t.label_id) where 
 t.deleted = 0 transformer=TemplateTransformer
        field column=title name=title_t /
        field column=label name=label_t /
        field column=id name=sf_meta_id /
        field column=metaclass template=Track name=sf_meta_class/
        field column=metaid template=${track.id} name=sf_meta_id/
        field column=uniqueid template=Track_${track.id} 
 name=sf_unique_id/

        entity name=artists query=select a.name as artist from artist a 
 left join track_artist ta on (ta.artist_id = a.id) where 
 ta.track_id=${track.id}
          field column=artist name=artists_t /
        /entity

      /entity

So there's one track entity with an artist sub-entity. My (admittedly
rather limited) experience has been that sub-entities, where you have
to run a separate query for every row in the parent entity, really
slow down data import. For my own purposes, I wrote a custom data
import using SolrJ to improve the performance (from 3 hours to 10
minutes).

Just as a test, how long does it take if you comment out the artists entity?


Re: Dataimport performance

2010-12-15 Thread Robert Gründler
i've benchmarked the import already with 500k records, one time without the 
artists subquery, and one time without the join in the main query:


Without subquery: 500k in 3 min 30 sec

Without join and without subquery: 500k in 2 min 30.

With subquery and with left join:   320k in 6 Min 30


so the joins / subqueries are definitely a bottleneck. 

How exactly did you implement the custom data import? 

In our case, we need to de-normalize the relations of the sql data for the 
index, 
so i fear i can't really get rid of the join / subquery.


-robert





On Dec 15, 2010, at 15:43 , Tim Heckman wrote:

 2010/12/15 Robert Gründler rob...@dubture.com:
 The data-config.xml looks like this (only 1 entity):
 
  entity name=track query=select t.id as id, t.title as title, 
 l.title as label from track t left join label l on (l.id = t.label_id) where 
 t.deleted = 0 transformer=TemplateTransformer
field column=title name=title_t /
field column=label name=label_t /
field column=id name=sf_meta_id /
field column=metaclass template=Track name=sf_meta_class/
field column=metaid template=${track.id} name=sf_meta_id/
field column=uniqueid template=Track_${track.id} 
 name=sf_unique_id/
 
entity name=artists query=select a.name as artist from artist a 
 left join track_artist ta on (ta.artist_id = a.id) where 
 ta.track_id=${track.id}
  field column=artist name=artists_t /
/entity
 
  /entity
 
 So there's one track entity with an artist sub-entity. My (admittedly
 rather limited) experience has been that sub-entities, where you have
 to run a separate query for every row in the parent entity, really
 slow down data import. For my own purposes, I wrote a custom data
 import using SolrJ to improve the performance (from 3 hours to 10
 minutes).
 
 Just as a test, how long does it take if you comment out the artists entity?



Re: Dataimport performance

2010-12-15 Thread Tim Heckman
The custom import I wrote is a java application that uses the SolrJ
library. Basically, where I had sub-entities in the DIH config I did
the mappings inside my java code.

1. Identify a subset or chunk of the primary id's to work on (so I
don't have to load everything into memory at once) and put those in a
temp table. I used a modulus on the id.
2. Select all of the outer entity from the database (joining on the
id's in the temp table), and load the data from that result set into
new solr input documents. I keep these in a hash map keyed on the
id's.
3. Then select all of the inner entity, joining on the id's from the
temp table. The result set has to include the id's from step 2. I go
through this result set and load the data into the matching solr input
documents from step 2.
4. Push that set of input documents to solr (optionally committing
them), then go back to step 1 using the next subset or chunk.

Not sure if this is the absolute best approach, but it's working well
enough for my specific case.

Tim


2010/12/15 Robert Gründler rob...@dubture.com:
 i've benchmarked the import already with 500k records, one time without the 
 artists subquery, and one time without the join in the main query:


 Without subquery: 500k in 3 min 30 sec

 Without join and without subquery: 500k in 2 min 30.

 With subquery and with left join:   320k in 6 Min 30


 so the joins / subqueries are definitely a bottleneck.

 How exactly did you implement the custom data import?

 In our case, we need to de-normalize the relations of the sql data for the 
 index,
 so i fear i can't really get rid of the join / subquery.


 -robert





 On Dec 15, 2010, at 15:43 , Tim Heckman wrote:

 2010/12/15 Robert Gründler rob...@dubture.com:
 The data-config.xml looks like this (only 1 entity):

      entity name=track query=select t.id as id, t.title as title, 
 l.title as label from track t left join label l on (l.id = t.label_id) 
 where t.deleted = 0 transformer=TemplateTransformer
        field column=title name=title_t /
        field column=label name=label_t /
        field column=id name=sf_meta_id /
        field column=metaclass template=Track name=sf_meta_class/
        field column=metaid template=${track.id} name=sf_meta_id/
        field column=uniqueid template=Track_${track.id} 
 name=sf_unique_id/

        entity name=artists query=select a.name as artist from artist a 
 left join track_artist ta on (ta.artist_id = a.id) where 
 ta.track_id=${track.id}
          field column=artist name=artists_t /
        /entity

      /entity

 So there's one track entity with an artist sub-entity. My (admittedly
 rather limited) experience has been that sub-entities, where you have
 to run a separate query for every row in the parent entity, really
 slow down data import. For my own purposes, I wrote a custom data
 import using SolrJ to improve the performance (from 3 hours to 10
 minutes).

 Just as a test, how long does it take if you comment out the artists entity?




Re: Dataimport performance

2010-12-15 Thread Lance Norskog
Can you do just one join in the top-level query? The DIH does not have
a batching mechanism for these joins, but your database does.

On Wed, Dec 15, 2010 at 7:11 AM, Tim Heckman theck...@gmail.com wrote:
 The custom import I wrote is a java application that uses the SolrJ
 library. Basically, where I had sub-entities in the DIH config I did
 the mappings inside my java code.

 1. Identify a subset or chunk of the primary id's to work on (so I
 don't have to load everything into memory at once) and put those in a
 temp table. I used a modulus on the id.
 2. Select all of the outer entity from the database (joining on the
 id's in the temp table), and load the data from that result set into
 new solr input documents. I keep these in a hash map keyed on the
 id's.
 3. Then select all of the inner entity, joining on the id's from the
 temp table. The result set has to include the id's from step 2. I go
 through this result set and load the data into the matching solr input
 documents from step 2.
 4. Push that set of input documents to solr (optionally committing
 them), then go back to step 1 using the next subset or chunk.

 Not sure if this is the absolute best approach, but it's working well
 enough for my specific case.

 Tim


 2010/12/15 Robert Gründler rob...@dubture.com:
 i've benchmarked the import already with 500k records, one time without the 
 artists subquery, and one time without the join in the main query:


 Without subquery: 500k in 3 min 30 sec

 Without join and without subquery: 500k in 2 min 30.

 With subquery and with left join:   320k in 6 Min 30


 so the joins / subqueries are definitely a bottleneck.

 How exactly did you implement the custom data import?

 In our case, we need to de-normalize the relations of the sql data for the 
 index,
 so i fear i can't really get rid of the join / subquery.


 -robert





 On Dec 15, 2010, at 15:43 , Tim Heckman wrote:

 2010/12/15 Robert Gründler rob...@dubture.com:
 The data-config.xml looks like this (only 1 entity):

      entity name=track query=select t.id as id, t.title as title, 
 l.title as label from track t left join label l on (l.id = t.label_id) 
 where t.deleted = 0 transformer=TemplateTransformer
        field column=title name=title_t /
        field column=label name=label_t /
        field column=id name=sf_meta_id /
        field column=metaclass template=Track name=sf_meta_class/
        field column=metaid template=${track.id} name=sf_meta_id/
        field column=uniqueid template=Track_${track.id} 
 name=sf_unique_id/

        entity name=artists query=select a.name as artist from artist a 
 left join track_artist ta on (ta.artist_id = a.id) where 
 ta.track_id=${track.id}
          field column=artist name=artists_t /
        /entity

      /entity

 So there's one track entity with an artist sub-entity. My (admittedly
 rather limited) experience has been that sub-entities, where you have
 to run a separate query for every row in the parent entity, really
 slow down data import. For my own purposes, I wrote a custom data
 import using SolrJ to improve the performance (from 3 hours to 10
 minutes).

 Just as a test, how long does it take if you comment out the artists entity?






-- 
Lance Norskog
goks...@gmail.com


Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2010-12-06 Thread stockii

maybe encoding !? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Dataimport-Could-not-load-driver-com-mysql-jdbc-Driver-tp2021616p2027138.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2010-12-05 Thread Koji Sekiguchi

(10/12/05 18:38), Ruixiang Zhang wrote:

*I got the following error for dataimport:*

*Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
load driver: com.mysql.jdbc.Driver*

I have the following files:

\example-DIH\solr\db\conf\   solrconfig.xml, schema.xml, db-data-config.xml,
dataimport.properties
\example-DIH\solr\db\lib\   mysql-connector-java-5.1.13-bin.jar


I guess the problem is the permission of the driver file?

Koji
--
http://www.rondhuit.com/en/


Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2010-12-05 Thread Ruixiang Zhang
Thanks Koji.

I just tried to change the permission of the driver file to 777, still can
not found the driver.

I put the driver into the folder with the original driver is and deleted the
original one. I don't know why solr can find the original one (if I don't
change anything), but not this one.

Thanks
Richard


On Sun, Dec 5, 2010 at 2:46 AM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 (10/12/05 18:38), Ruixiang Zhang wrote:

 *I got the following error for dataimport:*


 *Full Import failed
 org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
 load driver: com.mysql.jdbc.Driver*

 I have the following files:

 \example-DIH\solr\db\conf\   solrconfig.xml, schema.xml,
 db-data-config.xml,
 dataimport.properties
 \example-DIH\solr\db\lib\   mysql-connector-java-5.1.13-bin.jar


 I guess the problem is the permission of the driver file?

 Koji
 --
 http://www.rondhuit.com/en/



Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2010-12-05 Thread Ruixiang Zhang
And here are the logs:


Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImportHandler
processConfiguration
INFO: Processing configuration from solrconfig.xml:
{config=db-data-config.xml}
Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImporter
loadDataConfig
INFO: Data Configuration loaded successfully
Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImporter
verifyWithSchema
INFO: The field :title present in DataConfig does not have a counterpart in
Solr Schema
Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImporter
verifyWithSchema
INFO: The field :url present in DataConfig does not have a counterpart in
Solr Schema
Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
*org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
load driver: com.mysql.jdbc.Driver* Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:114)
at
org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:62)
at
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:304)
at
org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:94)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.init(SqlEntityProcessor.java:52)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
*Caused by: java.lang.ClassNotFoundException: Unable to load
com.mysql.jdbc.Driver or
org.apache.solr.handler.dataimport.com.mysql.jdbc.Driver*
at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:738)
at
org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:112)
... 32 more
Caused by: org.apache.solr.common.SolrException: Error loading class
'com.mysql.jdbc.Driver'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:728)
... 33 more
*Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver*
at 

Re: Dataimport: Could not load driver: com.mysql.jdbc.Driver

2010-12-05 Thread Ruixiang Zhang
Hi Koji

I finally found the reason for this problem:

I download the tar file of the driver and unzip it in windows. Then I put
the jar file into the server. I don' t know why, but it doesn't work. It
works when I put the tar file and unzip it in the server.

Thanks a lot for your time!!!
Richard



On Sun, Dec 5, 2010 at 3:02 AM, Ruixiang Zhang rxzh...@gmail.com wrote:

 And here are the logs:


 Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImportHandler
 processConfiguration
 INFO: Processing configuration from solrconfig.xml:
 {config=db-data-config.xml}
 Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImporter
 loadDataConfig
 INFO: Data Configuration loaded successfully
 Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImporter
 verifyWithSchema
 INFO: The field :title present in DataConfig does not have a counterpart in
 Solr Schema
 Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImporter
 verifyWithSchema
 INFO: The field :url present in DataConfig does not have a counterpart in
 Solr Schema
 Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
 INFO: Starting Full Import
 Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.SolrWriter
 readIndexerProperties
 INFO: Read dataimport.properties
 Dec 5, 2010 2:00:23 AM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
 SEVERE: Full Import failed
 *org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
 load driver: com.mysql.jdbc.Driver* Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:114)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:62)
 at
 org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:304)
 at
 org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:94)
 at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.init(SqlEntityProcessor.java:52)
 at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
 at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
 at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
 at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
 at
 org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
 at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
 at org.mortbay.jetty.Server.handle(Server.java:285)
 at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
 at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
 at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
 at
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 *Caused by: java.lang.ClassNotFoundException: Unable to load
 com.mysql.jdbc.Driver or
 org.apache.solr.handler.dataimport.com.mysql.jdbc.Driver*
 at
 org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:738)
 at
 

Re: Dataimport destroys our harddisks

2010-12-02 Thread Erick Erickson
The very first thing I'd ask is how much free space is on your disk
when this occurs? Is it possible that you're simply filling up your
disk?

do note that an optimize may require up to 2X the size of your index
if/when it occurs. Are you sure you aren't optimizing as you add
items to your index?

But I've never heard of Solr causing hard disk crashes, it doesn't do
anything special but read/write...

Best
Erick

2010/12/2 Robert Gründler rob...@dubture.com

 Hi,

 we have a serious harddisk problem, and it's definitely related to a
 full-import from a relational
 database into a solr index.

 The first time it happened on our development server, where the
 raidcontroller crashed during a full-import
 of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2
 of the harddisks where the solr
 index files are located stopped working (we needed to replace them).

 After the crash of the raid controller, we decided to move the development
 of solr/index related stuff to our
 local development machines.

 Yesterday i was running another full-import of ~10 Million documents on my
 local development machine,
 and during the import, a harddisk failure occurred. Since this failure, my
 harddisk activity seems to
 be around 100% all the time, even if no solr server is running at all.

 I've been googling the last 2 days to find some info about solr related
 harddisk problems, but i didn't find anything
 useful.

 Are there any steps we need to take care of in respect to harddisk failures
 when doing a full-import? Right now,
 our steps look like this:

 1. Delete the current index
 2. Restart solr, to load the updated schemas
 3. Start the full import

 Initially, the solr index and the relational database were located on the
 same harddisk. After the crash, we moved
 the index to a separate harddisk, but nevertheless this harddisk crashed
 too.

 I'd really appreciate any hints on what we might do wrong when importing
 data, as we can't release this
 on our production servers when there's the risk of harddisk failures.


 thanks.


 -robert








Re: Dataimport destroys our harddisks

2010-12-02 Thread Robert Gründler
 The very first thing I'd ask is how much free space is on your disk
 when this occurs? Is it possible that you're simply filling up your
 disk?

no, i've checked that already. all disks have plenty of space (they have
a capacity of 2TB, and are currently filled up to 20%.

 
 do note that an optimize may require up to 2X the size of your index
 if/when it occurs. Are you sure you aren't optimizing as you add
 items to your index?
 

index size is not a problem in our case. Our index currently has about 3GB.

What do you mean with optimizing as you add items to your index? 

 But I've never heard of Solr causing hard disk crashes,

neither did we, and google is the same opinion. 

One thing that i've found is the mergeFactor value:

http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

Our sysadmin speculates that maybe the chunk size of our raid/harddisks
and the segment size of the lucene index does not play well together.

Does the lucene segment size affect how the data is written to the disk?


thanks for your help.


-robert







 
 Best
 Erick
 
 2010/12/2 Robert Gründler rob...@dubture.com
 
 Hi,
 
 we have a serious harddisk problem, and it's definitely related to a
 full-import from a relational
 database into a solr index.
 
 The first time it happened on our development server, where the
 raidcontroller crashed during a full-import
 of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2
 of the harddisks where the solr
 index files are located stopped working (we needed to replace them).
 
 After the crash of the raid controller, we decided to move the development
 of solr/index related stuff to our
 local development machines.
 
 Yesterday i was running another full-import of ~10 Million documents on my
 local development machine,
 and during the import, a harddisk failure occurred. Since this failure, my
 harddisk activity seems to
 be around 100% all the time, even if no solr server is running at all.
 
 I've been googling the last 2 days to find some info about solr related
 harddisk problems, but i didn't find anything
 useful.
 
 Are there any steps we need to take care of in respect to harddisk failures
 when doing a full-import? Right now,
 our steps look like this:
 
 1. Delete the current index
 2. Restart solr, to load the updated schemas
 3. Start the full import
 
 Initially, the solr index and the relational database were located on the
 same harddisk. After the crash, we moved
 the index to a separate harddisk, but nevertheless this harddisk crashed
 too.
 
 I'd really appreciate any hints on what we might do wrong when importing
 data, as we can't release this
 on our production servers when there's the risk of harddisk failures.
 
 
 thanks.
 
 
 -robert
 
 
 
 
 
 



Re: Dataimport destroys our harddisks

2010-12-02 Thread Sven Almgren
What Raid controller do you use, and what kernel version? (Assuming
Linux). We hade problems during high load with a 3Ware raid controller
and the current kernel for Ubuntu 10.04, we hade to downgrade the
kernel...

The problem was a bug in the driver that only showed up with very high
disk load (as is the case when doing imports)

/Sven

2010/12/2 Robert Gründler rob...@dubture.com:
 The very first thing I'd ask is how much free space is on your disk
 when this occurs? Is it possible that you're simply filling up your
 disk?

 no, i've checked that already. all disks have plenty of space (they have
 a capacity of 2TB, and are currently filled up to 20%.


 do note that an optimize may require up to 2X the size of your index
 if/when it occurs. Are you sure you aren't optimizing as you add
 items to your index?


 index size is not a problem in our case. Our index currently has about 3GB.

 What do you mean with optimizing as you add items to your index?

 But I've never heard of Solr causing hard disk crashes,

 neither did we, and google is the same opinion.

 One thing that i've found is the mergeFactor value:

 http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

 Our sysadmin speculates that maybe the chunk size of our raid/harddisks
 and the segment size of the lucene index does not play well together.

 Does the lucene segment size affect how the data is written to the disk?


 thanks for your help.


 -robert








 Best
 Erick

 2010/12/2 Robert Gründler rob...@dubture.com

 Hi,

 we have a serious harddisk problem, and it's definitely related to a
 full-import from a relational
 database into a solr index.

 The first time it happened on our development server, where the
 raidcontroller crashed during a full-import
 of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2
 of the harddisks where the solr
 index files are located stopped working (we needed to replace them).

 After the crash of the raid controller, we decided to move the development
 of solr/index related stuff to our
 local development machines.

 Yesterday i was running another full-import of ~10 Million documents on my
 local development machine,
 and during the import, a harddisk failure occurred. Since this failure, my
 harddisk activity seems to
 be around 100% all the time, even if no solr server is running at all.

 I've been googling the last 2 days to find some info about solr related
 harddisk problems, but i didn't find anything
 useful.

 Are there any steps we need to take care of in respect to harddisk failures
 when doing a full-import? Right now,
 our steps look like this:

 1. Delete the current index
 2. Restart solr, to load the updated schemas
 3. Start the full import

 Initially, the solr index and the relational database were located on the
 same harddisk. After the crash, we moved
 the index to a separate harddisk, but nevertheless this harddisk crashed
 too.

 I'd really appreciate any hints on what we might do wrong when importing
 data, as we can't release this
 on our production servers when there's the risk of harddisk failures.


 thanks.


 -robert










Re: Dataimport destroys our harddisks

2010-12-02 Thread Robert Gründler
On Dec 2, 2010, at 15:43 , Sven Almgren wrote:

 What Raid controller do you use, and what kernel version? (Assuming
 Linux). We hade problems during high load with a 3Ware raid controller
 and the current kernel for Ubuntu 10.04, we hade to downgrade the
 kernel...
 
 The problem was a bug in the driver that only showed up with very high
 disk load (as is the case when doing imports)
 

We're running freebsd:

RaidController  3ware 9500S-8
Corrupt unit: Raid-10 3725.27GB 256K Stripe Size without BBU
Freebsd 7.2, UFS Filesystem.



 /Sven
 
 2010/12/2 Robert Gründler rob...@dubture.com:
 The very first thing I'd ask is how much free space is on your disk
 when this occurs? Is it possible that you're simply filling up your
 disk?
 
 no, i've checked that already. all disks have plenty of space (they have
 a capacity of 2TB, and are currently filled up to 20%.
 
 
 do note that an optimize may require up to 2X the size of your index
 if/when it occurs. Are you sure you aren't optimizing as you add
 items to your index?
 
 
 index size is not a problem in our case. Our index currently has about 3GB.
 
 What do you mean with optimizing as you add items to your index?
 
 But I've never heard of Solr causing hard disk crashes,
 
 neither did we, and google is the same opinion.
 
 One thing that i've found is the mergeFactor value:
 
 http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor
 
 Our sysadmin speculates that maybe the chunk size of our raid/harddisks
 and the segment size of the lucene index does not play well together.
 
 Does the lucene segment size affect how the data is written to the disk?
 
 
 thanks for your help.
 
 
 -robert
 
 
 
 
 
 
 
 
 Best
 Erick
 
 2010/12/2 Robert Gründler rob...@dubture.com
 
 Hi,
 
 we have a serious harddisk problem, and it's definitely related to a
 full-import from a relational
 database into a solr index.
 
 The first time it happened on our development server, where the
 raidcontroller crashed during a full-import
 of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2
 of the harddisks where the solr
 index files are located stopped working (we needed to replace them).
 
 After the crash of the raid controller, we decided to move the development
 of solr/index related stuff to our
 local development machines.
 
 Yesterday i was running another full-import of ~10 Million documents on my
 local development machine,
 and during the import, a harddisk failure occurred. Since this failure, my
 harddisk activity seems to
 be around 100% all the time, even if no solr server is running at all.
 
 I've been googling the last 2 days to find some info about solr related
 harddisk problems, but i didn't find anything
 useful.
 
 Are there any steps we need to take care of in respect to harddisk failures
 when doing a full-import? Right now,
 our steps look like this:
 
 1. Delete the current index
 2. Restart solr, to load the updated schemas
 3. Start the full import
 
 Initially, the solr index and the relational database were located on the
 same harddisk. After the crash, we moved
 the index to a separate harddisk, but nevertheless this harddisk crashed
 too.
 
 I'd really appreciate any hints on what we might do wrong when importing
 data, as we can't release this
 on our production servers when there's the risk of harddisk failures.
 
 
 thanks.
 
 
 -robert
 
 
 
 
 
 
 
 



Re: Dataimport destroys our harddisks

2010-12-02 Thread Sven Almgren
That's the same series we use... we hade problems when running other
disk-heavy operations like rsync and backup on them too..

But in our case we mostly had hangs or load  180 :P... Can you
simulate very heavy random disk i/o? if so then you could check if you
still have the same problems...

That's all I can be of help with, good luck :)

/Sven

2010/12/2 Robert Gründler rob...@dubture.com:
 On Dec 2, 2010, at 15:43 , Sven Almgren wrote:

 What Raid controller do you use, and what kernel version? (Assuming
 Linux). We hade problems during high load with a 3Ware raid controller
 and the current kernel for Ubuntu 10.04, we hade to downgrade the
 kernel...

 The problem was a bug in the driver that only showed up with very high
 disk load (as is the case when doing imports)


 We're running freebsd:

 RaidController  3ware 9500S-8
 Corrupt unit: Raid-10 3725.27GB 256K Stripe Size without BBU
 Freebsd 7.2, UFS Filesystem.



 /Sven

 2010/12/2 Robert Gründler rob...@dubture.com:
 The very first thing I'd ask is how much free space is on your disk
 when this occurs? Is it possible that you're simply filling up your
 disk?

 no, i've checked that already. all disks have plenty of space (they have
 a capacity of 2TB, and are currently filled up to 20%.


 do note that an optimize may require up to 2X the size of your index
 if/when it occurs. Are you sure you aren't optimizing as you add
 items to your index?


 index size is not a problem in our case. Our index currently has about 3GB.

 What do you mean with optimizing as you add items to your index?

 But I've never heard of Solr causing hard disk crashes,

 neither did we, and google is the same opinion.

 One thing that i've found is the mergeFactor value:

 http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

 Our sysadmin speculates that maybe the chunk size of our raid/harddisks
 and the segment size of the lucene index does not play well together.

 Does the lucene segment size affect how the data is written to the disk?


 thanks for your help.


 -robert








 Best
 Erick

 2010/12/2 Robert Gründler rob...@dubture.com

 Hi,

 we have a serious harddisk problem, and it's definitely related to a
 full-import from a relational
 database into a solr index.

 The first time it happened on our development server, where the
 raidcontroller crashed during a full-import
 of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2
 of the harddisks where the solr
 index files are located stopped working (we needed to replace them).

 After the crash of the raid controller, we decided to move the development
 of solr/index related stuff to our
 local development machines.

 Yesterday i was running another full-import of ~10 Million documents on my
 local development machine,
 and during the import, a harddisk failure occurred. Since this failure, my
 harddisk activity seems to
 be around 100% all the time, even if no solr server is running at all.

 I've been googling the last 2 days to find some info about solr related
 harddisk problems, but i didn't find anything
 useful.

 Are there any steps we need to take care of in respect to harddisk 
 failures
 when doing a full-import? Right now,
 our steps look like this:

 1. Delete the current index
 2. Restart solr, to load the updated schemas
 3. Start the full import

 Initially, the solr index and the relational database were located on the
 same harddisk. After the crash, we moved
 the index to a separate harddisk, but nevertheless this harddisk crashed
 too.

 I'd really appreciate any hints on what we might do wrong when importing
 data, as we can't release this
 on our production servers when there's the risk of harddisk failures.


 thanks.


 -robert












Re: DataImport problem

2010-09-04 Thread Lance Norskog
The RSS example does not do this. It declares only the source, and gives 
all of the parameters in the entity.


You can have different entities with different uses of the datasource.

In general, the DIH is easier to use when starting with one of the 
examples and slowing changing one thing at a time.


Lance

Jason Chaffee wrote:

I am getting the following error with the DataImport and I am not sure why as I 
following the documentation.  I am trying to use XPath and the URLDataSource 
but it fails to load the datasource.

SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: No dataSource 
:null available for entity :store Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:279)
at 
org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:94)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:78)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Sep 4, 2010 7:12:51 PM org.apache.solr.update.DirectUpdateHandler2 rollback


Here is my dataconfig.xml:

dataConfig
   dataSource name=store type=URLDataSource baseUrl=http://localhost:8080/app; encoding=UTF-8 
connectionTimeout=5000 readTimeout=6 /
   document name=store
 entity name=store processor=XPathEntityProcessor stream=true 
forEach=/stores/store url=/store/list
   field column=id xpath=/stores/store/id /
   field column=name xpath=/stores/store/name /
 /entity
   /document
/dataConfig
   


Re: DataImport issue with large number of documents

2010-06-08 Thread Glen Newton
As the index gets larger, the underlying housekeeping of the Lucene
index sometimes causes pauses in the indexing. The JDBC connection
(and/or the underlying socket) to the MySql database can time out
during these pauses.

- If it is not set, you should add this to your JCBD url: autoreconnect=true
- Increase the netTimoutForStreamingResults value from
http://lucene.grantingersoll.com/2008/07/16/mysql-solr-and-communications-link-failure/

See also: 
http://lucene.472066.n3.nabble.com/Recommended-MySQL-JDBC-driver-td817458.html

-Glen Newton
http://zzzoot.blogspot.com/

On 09/06/2010, Giri giriprak...@gmail.com wrote:
 Hi Group,

 I have been trying index about 70 million records in the solr index, the
 data is coming from the MySQL database, and I am using the DataImportHandler
 with batchSize set to -1. When I perform a full-import, it indexes about 27
 million records then throws the following exception:

 Any help will be really appreciated!

 thanks!

 Giri

 ---

 WARNING: Error reading data
 com.mysql.jdbc.CommunicationsException: Communications link failure due to
 underlying exception:

 ** BEGIN NESTED EXCEPTION **

 java.io.EOFException

 STACKTRACE:

 java.io.EOFException
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1934)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2433)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2909)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:798)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1316)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:370)
at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:360)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:5897)
at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:265)
at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:161)
at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:196)
at
 org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229)
at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77)
at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)


 ** END NESTED EXCEPTION **



 Last packet sent to the server was 5359471 ms ago.
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2592)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2909)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:798)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1316)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:370)
at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:360)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:5897)
at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:265)
at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:161)
at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:196)
at
 org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229)
at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77)
at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)



-- 

-


Re: DataImport issue with large number of documents

2010-06-08 Thread Giri
Hi Glen,

Thank you very much for the quick response, I would like to try increasing
the netTimoutForStreamingResults , is that something I can do it in the
MySQL side? or in the solr side?

Giri

On Tue, Jun 8, 2010 at 6:17 PM, Glen Newton glen.new...@gmail.com wrote:

 As the index gets larger, the underlying housekeeping of the Lucene
 index sometimes causes pauses in the indexing. The JDBC connection
 (and/or the underlying socket) to the MySql database can time out
 during these pauses.

 - If it is not set, you should add this to your JCBD url:
 autoreconnect=true
 - Increase the netTimoutForStreamingResults value from

 http://lucene.grantingersoll.com/2008/07/16/mysql-solr-and-communications-link-failure/

 See also:
 http://lucene.472066.n3.nabble.com/Recommended-MySQL-JDBC-driver-td817458.html

 -Glen Newton
 http://zzzoot.blogspot.com/

 On 09/06/2010, Giri giriprak...@gmail.com wrote:
  Hi Group,
 
  I have been trying index about 70 million records in the solr index, the
  data is coming from the MySQL database, and I am using the
 DataImportHandler
  with batchSize set to -1. When I perform a full-import, it indexes about
 27
  million records then throws the following exception:
 
  Any help will be really appreciated!
 
  thanks!
 
  Giri
 
  ---
 
  WARNING: Error reading data
  com.mysql.jdbc.CommunicationsException: Communications link failure due
 to
  underlying exception:
 
  ** BEGIN NESTED EXCEPTION **
 
  java.io.EOFException
 
  STACKTRACE:
 
  java.io.EOFException
 at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1934)
 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2433)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2909)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:798)
 at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1316)
 at
 com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:370)
 at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:360)
 at com.mysql.jdbc.ResultSet.next(ResultSet.java:5897)
 at
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:265)
 at
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:161)
 at
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:196)
 at
 
 org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229)
 at
 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
 at
 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
 at
 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
 at
 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
 
 
  ** END NESTED EXCEPTION **
 
 
 
  Last packet sent to the server was 5359471 ms ago.
 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2592)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2909)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:798)
 at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1316)
 at
 com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:370)
 at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:360)
 at com.mysql.jdbc.ResultSet.next(ResultSet.java:5897)
 at
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:265)
 at
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$500(JdbcDataSource.java:161)
 at
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:196)
 at
 
 org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:229)
 at
 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:77)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
 at
 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
 at
 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
 at
 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
 


 --

 -



Re: Dataimport MySQLNonTransientConnectionException: No operations allowed after connection closed

2009-09-15 Thread Noble Paul നോബിള്‍ नोब्ळ्
First of all let us confirm this issue is fixed in 1.4.

1.4 is stable and a lot of people are using it in production and it is
going to be released pretty soon

On Mon, Sep 14, 2009 at 8:05 PM, palexv pal...@gmail.com wrote:

 I am using 1.3
 Do you suggest 1.4 from developer trunk? I am concern if it stable. Is it
 safe to use it in big commerce app?



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 which version of Solr are you using. can you try with a recent one and
 confirm this?

 On Mon, Sep 14, 2009 at 7:45 PM, palexv pal...@gmail.com wrote:

 I know that my issue is related to
 http://www.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html#a19160129
 and https://issues.apache.org/jira/browse/SOLR-728
 but my case is quite different.
 As I understand patch at https://issues.apache.org/jira/browse/SOLR-728
 prevents concurrent executing of import operation but does NOT put
 command
 in a queue.

 I have only few records to index. When run full reindex - it works very
 fast. But when I try to rerun this even after a couple of seconds - I am
 getting
 Caused by:
 com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
 No operations allowed after connection closed.

 At this time, when I check status - it says that status is idle and
 everything was indexed success.
 Second run of reindex without exception I can run only after 10 seconds.
 It does not work for me! If I apply patch from
 https://issues.apache.org/jira/browse/SOLR-728 - I will unable to reindex
 in
 next 10 seconds as well.
 Any suggestions?
 --
 View this message in context:
 http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436605.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 View this message in context: 
 http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436948.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


RE: Dataimport MySQLNonTransientConnectionException: No operations allowed after connection closed

2009-09-15 Thread Fuad Efendi

Easy FIX: use autoReconnect=true for MySQL:

jdbc:mysql://localhost:3306/?useUnicode=truecharacterEncoding=UTF-8autoReconnect=true


May be it will help; connection is auto-closed  after a couple of seconds 
(usually 10 seconds) by default, for MySQL... connection pooling won't help 
(their JDBC is already pool based, and server closes connection after some 
delays)


-Fuad
(MySQL contributor)




 -Original Message-
 From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble
 Paul ??? ??
 Sent: September-15-09 3:48 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Dataimport MySQLNonTransientConnectionException: No operations
 allowed after connection closed
 
 First of all let us confirm this issue is fixed in 1.4.
 
 1.4 is stable and a lot of people are using it in production and it is
 going to be released pretty soon
 
 On Mon, Sep 14, 2009 at 8:05 PM, palexv pal...@gmail.com wrote:
 
  I am using 1.3
  Do you suggest 1.4 from developer trunk? I am concern if it stable. Is it
  safe to use it in big commerce app?
 
 
 
  Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
  which version of Solr are you using. can you try with a recent one and
  confirm this?
 
  On Mon, Sep 14, 2009 at 7:45 PM, palexv pal...@gmail.com wrote:
 
  I know that my issue is related to
  http://www.nabble.com/dataimporthandler-and-multiple-delta-import-
 td19160129.html#a19160129
  and https://issues.apache.org/jira/browse/SOLR-728
  but my case is quite different.
  As I understand patch at https://issues.apache.org/jira/browse/SOLR-728
  prevents concurrent executing of import operation but does NOT put
  command
  in a queue.
 
  I have only few records to index. When run full reindex - it works very
  fast. But when I try to rerun this even after a couple of seconds - I am
  getting
  Caused by:
  com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
  No operations allowed after connection closed.
 
  At this time, when I check status - it says that status is idle and
  everything was indexed success.
  Second run of reindex without exception I can run only after 10 seconds.
  It does not work for me! If I apply patch from
  https://issues.apache.org/jira/browse/SOLR-728 - I will unable to reindex
  in
  next 10 seconds as well.
  Any suggestions?
  --
  View this message in context:
  http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-
 No-operations-allowed-after-connection-closed-tp25436605p25436605.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 
 
  --
  View this message in context: http://www.nabble.com/Dataimport-
 MySQLNonTransientConnectionException%3A-No-operations-allowed-after-
 connection-closed-tp25436605p25436948.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




Re: Dataimport MySQLNonTransientConnectionException: No operations allowed after connection closed

2009-09-14 Thread palexv

I am using 1.3
Do you suggest 1.4 from developer trunk? I am concern if it stable. Is it
safe to use it in big commerce app?



Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 which version of Solr are you using. can you try with a recent one and
 confirm this?
 
 On Mon, Sep 14, 2009 at 7:45 PM, palexv pal...@gmail.com wrote:

 I know that my issue is related to
 http://www.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html#a19160129
 and https://issues.apache.org/jira/browse/SOLR-728
 but my case is quite different.
 As I understand patch at https://issues.apache.org/jira/browse/SOLR-728
 prevents concurrent executing of import operation but does NOT put
 command
 in a queue.

 I have only few records to index. When run full reindex - it works very
 fast. But when I try to rerun this even after a couple of seconds - I am
 getting
 Caused by:
 com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
 No operations allowed after connection closed.

 At this time, when I check status - it says that status is idle and
 everything was indexed success.
 Second run of reindex without exception I can run only after 10 seconds.
 It does not work for me! If I apply patch from
 https://issues.apache.org/jira/browse/SOLR-728 - I will unable to reindex
 in
 next 10 seconds as well.
 Any suggestions?
 --
 View this message in context:
 http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436605.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 
 

-- 
View this message in context: 
http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436948.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Dataimport MySQLNonTransientConnectionException: No operations allowed after connection closed

2009-09-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
which version of Solr are you using. can you try with a recent one and
confirm this?

On Mon, Sep 14, 2009 at 7:45 PM, palexv pal...@gmail.com wrote:

 I know that my issue is related to
 http://www.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html#a19160129
 and https://issues.apache.org/jira/browse/SOLR-728
 but my case is quite different.
 As I understand patch at https://issues.apache.org/jira/browse/SOLR-728
 prevents concurrent executing of import operation but does NOT put command
 in a queue.

 I have only few records to index. When run full reindex - it works very
 fast. But when I try to rerun this even after a couple of seconds - I am
 getting
 Caused by: com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
 No operations allowed after connection closed.

 At this time, when I check status - it says that status is idle and
 everything was indexed success.
 Second run of reindex without exception I can run only after 10 seconds.
 It does not work for me! If I apply patch from
 https://issues.apache.org/jira/browse/SOLR-728 - I will unable to reindex in
 next 10 seconds as well.
 Any suggestions?
 --
 View this message in context: 
 http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436605.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: DataImport, remove doc when marked as deleted

2009-04-17 Thread Ruben Chadien

I have now :-)
Thanks , missed that in the Wiki.
Ruben

On Apr 16, 2009, at 7:10 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



did you try the deletedPkQuery?

On Thu, Apr 16, 2009 at 7:49 PM, Ruben Chadien ruben.chad...@aspiro.com 
 wrote:

Hi

I am new to Solr, but have been using Lucene for a while. I am  
trying to

rewrite
some old lucene indexing code using the Jdbc DataImport i Solr, my  
problem:


I have Entities that can be marked in the db as deleted, these i  
don't

want to index
and thats no problem when doing a full-import. When doing a delta- 
import my

deltaQuery will catch
Entities that has been marked as deleted since last index, but how  
do i get

it to delete those from the index ?
I tried making the deltaImportQuery so that in don't return the  
Entity if

its deleted, that didnt help...

Any ideas ?

Thanks
Ruben







--
--Noble Paul




Re: DataImport, remove doc when marked as deleted

2009-04-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
did you try the deletedPkQuery?

On Thu, Apr 16, 2009 at 7:49 PM, Ruben Chadien ruben.chad...@aspiro.com wrote:
 Hi

 I am new to Solr, but have been using Lucene for a while. I am trying to
 rewrite
 some old lucene indexing code using the Jdbc DataImport i Solr, my problem:

 I have Entities that can be marked in the db as deleted, these i don't
 want to index
 and thats no problem when doing a full-import. When doing a delta-import my
 deltaQuery will catch
 Entities that has been marked as deleted since last index, but how do i get
 it to delete those from the index ?
 I tried making the deltaImportQuery so that in don't return the Entity if
 its deleted, that didnt help...

 Any ideas ?

 Thanks
 Ruben






-- 
--Noble Paul


Re: DataImport TXT file entity processor

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
an EntityProcessor looks right to me. It may help us add more
attributes if needed.

PlainTextEntityProcessor looks like a good name. It can also be used
to read html etc.
--Noble

On Sat, Jan 24, 2009 at 12:37 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Sat, Jan 24, 2009 at 5:56 AM, Nathan Adams na...@umich.edu wrote:

 Is there a way to us Data Import Handler to index non-XML (i.e. simple
 text) files (either via HTTP or FileSystem)?  I need to put the entire
 contents of a text file into a single field of a document and the other
 fields are being pulled out of Oracle...


 Not yet. But I think it will be nice to have. Can you open an issue in Jira?

 I think importing from HTTP was something another user had asked for
 recently. How do you get the url/path of this text file? That would help
 decide if we need a Transformer or EntityProcessor for these tasks.
 --
 Regards,
 Shalin Shekhar Mangar.




-- 
--Noble Paul


Re: DataImport

2009-01-06 Thread Performance

Paul,

Thanks for the feedback and it does work.  So if I understand this the app
server code (Jetty) is not reading in the environment variables for the
other libraries I need.  How do I add the JDBC files to the path so that I
don't need to copy the files into the directory?  Does jetty have a config
file I should look at?


Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 The driver can be put directly into the WEB-INF/lib of the solr web
 app or it can be put into ${solr.home}/lib dir.
 
 or if something is really screwed up you can try the old fashioned way
 of putting your driver jar into JAVA_HOME/lib/ext
 
 --Noble
 
 
 On Tue, Jan 6, 2009 at 7:05 AM, Performance dcr...@crossview.com wrote:

 I have been following this tutorial but I can't seem to get past an error
 related to not being able to load the DB2 Driver.  The user has all the
 right config to load the JDBC driver and Squirrel works fine.  Do I need
 to
 update and path within Solr?



 muxa wrote:

 Looked through the tutorial on data import, section Full Import
 Example.
 1) Where is this dataimport.jar? There is no such file in the
 extracted example-solr-home.jar.
 2) Use the solr folder inside example-data-config folder as your
 solr home. What does this mean? Anyway, there is no folder
 example-data-config.
  Ar cieņu, Mihails


 --
 View this message in context:
 http://www.nabble.com/DataImport-tp17730791p21301571.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/DataImport-tp17730791p21309725.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImport

2009-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
Which approach worked? I suggested three
Jetty automatically loads jars in WEB-INF/lib
it is the responsibility of Solr to load jars from solr.ome/lib
it is the responsibility of the JRE to load jars from JAVA_HOME/lib/ext

On Tue, Jan 6, 2009 at 6:18 PM, Performance dcr...@crossview.com wrote:

 Paul,

 Thanks for the feedback and it does work.  So if I understand this the app
 server code (Jetty) is not reading in the environment variables for the
 other libraries I need.  How do I add the JDBC files to the path so that I
 don't need to copy the files into the directory?  Does jetty have a config
 file I should look at?


 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 The driver can be put directly into the WEB-INF/lib of the solr web
 app or it can be put into ${solr.home}/lib dir.

 or if something is really screwed up you can try the old fashioned way
 of putting your driver jar into JAVA_HOME/lib/ext

 --Noble


 On Tue, Jan 6, 2009 at 7:05 AM, Performance dcr...@crossview.com wrote:

 I have been following this tutorial but I can't seem to get past an error
 related to not being able to load the DB2 Driver.  The user has all the
 right config to load the JDBC driver and Squirrel works fine.  Do I need
 to
 update and path within Solr?



 muxa wrote:

 Looked through the tutorial on data import, section Full Import
 Example.
 1) Where is this dataimport.jar? There is no such file in the
 extracted example-solr-home.jar.
 2) Use the solr folder inside example-data-config folder as your
 solr home. What does this mean? Anyway, there is no folder
 example-data-config.
  Ar cieņu, Mihails


 --
 View this message in context:
 http://www.nabble.com/DataImport-tp17730791p21301571.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul



 --
 View this message in context: 
 http://www.nabble.com/DataImport-tp17730791p21309725.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: DataImport

2009-01-05 Thread Performance

I have been following this tutorial but I can't seem to get past an error
related to not being able to load the DB2 Driver.  The user has all the
right config to load the JDBC driver and Squirrel works fine.  Do I need to
update and path within Solr?



muxa wrote:
 
 Looked through the tutorial on data import, section Full Import
 Example.
 1) Where is this dataimport.jar? There is no such file in the
 extracted example-solr-home.jar.
 2) Use the solr folder inside example-data-config folder as your
 solr home. What does this mean? Anyway, there is no folder
 example-data-config.
  Ar cieņu, Mihails
 

-- 
View this message in context: 
http://www.nabble.com/DataImport-tp17730791p21301571.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImport

2009-01-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
The driver can be put directly into the WEB-INF/lib of the solr web
app or it can be put into ${solr.home}/lib dir.

or if something is really screwed up you can try the old fashioned way
of putting your driver jar into JAVA_HOME/lib/ext

--Noble


On Tue, Jan 6, 2009 at 7:05 AM, Performance dcr...@crossview.com wrote:

 I have been following this tutorial but I can't seem to get past an error
 related to not being able to load the DB2 Driver.  The user has all the
 right config to load the JDBC driver and Squirrel works fine.  Do I need to
 update and path within Solr?



 muxa wrote:

 Looked through the tutorial on data import, section Full Import
 Example.
 1) Where is this dataimport.jar? There is no such file in the
 extracted example-solr-home.jar.
 2) Use the solr folder inside example-data-config folder as your
 solr home. What does this mean? Anyway, there is no folder
 example-data-config.
  Ar cieņu, Mihails


 --
 View this message in context: 
 http://www.nabble.com/DataImport-tp17730791p21301571.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: dataimport handler with mysql: wrong field mapping

2008-12-15 Thread Luca Molteni
Have you tried using the

dynamicField name=* type=string indexed=true /

options in the schema.xml? After the indexing, take a look to the
fields DIH has generated.

Bye,

L.M.



2008/12/15 jokkmokk jokkm...@gmx.at:

 HI,

 I'm desperately trying to get the dataimport handler to work, however it
 seems that it just ignores the field name mapping.
 I have the fields body and subject in the database and those are called
 title and content in the solr schema, so I use the following import
 config:

 dataConfig

 dataSource
type=JdbcDataSource
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/mydb
user=root
password=/


 document
entity name=phorum_messages query=select * from phorum_messages
field column=body name=content/
field column=subject name=title/
/entity
 /document

 /dataConfig

 however I always get the following exception:

 org.apache.solr.common.SolrException: ERROR:unknown field 'body'
at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274)
at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at
 org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
at
 org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279)
at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317)
at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326)
at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367)


 but according to the documentation it should add a document with title and
 content not body and subject?!

 I'd appreciate any help as I can't see anything wrong with my
 configuration...

 TIA,

 Stefan
 --
 View this message in context: 
 http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: dataimport handler with mysql: wrong field mapping

2008-12-15 Thread jokkmokk

sorry, I'm using the 1.3.0 release. I've now worked around that issue by
using aliases in the sql statement so that no mapping is needed. This way it
works perfectly.

best regards

Stefan


Shalin Shekhar Mangar wrote:
 
 Which solr version are you using?
 
-- 
View this message in context: 
http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013639.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dataimport handler with mysql: wrong field mapping

2008-12-15 Thread Shalin Shekhar Mangar
Which solr version are you using?

On Mon, Dec 15, 2008 at 6:04 PM, jokkmokk jokkm...@gmx.at wrote:


 HI,

 I'm desperately trying to get the dataimport handler to work, however it
 seems that it just ignores the field name mapping.
 I have the fields body and subject in the database and those are called
 title and content in the solr schema, so I use the following import
 config:

 dataConfig

 dataSource
type=JdbcDataSource
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/mydb
user=root
password=/


 document
entity name=phorum_messages query=select * from phorum_messages
field column=body name=content/
field column=subject name=title/
/entity
 /document

 /dataConfig

 however I always get the following exception:

 org.apache.solr.common.SolrException: ERROR:unknown field 'body'
at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274)
at

 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at
 org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
at

 org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279)
at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317)
at

 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326)
at

 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at

 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367)


 but according to the documentation it should add a document with title
 and
 content not body and subject?!

 I'd appreciate any help as I can't see anything wrong with my
 configuration...

 TIA,

 Stefan
 --
 View this message in context:
 http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImport Hadnler - new bee question

2008-12-02 Thread Jae Joo
I actually found the problem. Oracle returns the field name as Capital.

On Tue, Dec 2, 2008 at 1:57 PM, Jae Joo [EMAIL PROTECTED] wrote:

 Hey,

 I am trying to connect the Oracle database and index the values into solr,
 but I ma getting the
 Document [null] missing required field: id.

 Here is the debug output.
 str name=Total Requests made to DataSource1/str
 str name=Total Rows Fetched2/str
 str name=Total Documents Skipped0/str
 str name=Full Dump Started2008-12-02 13:49:35/str
 −
 str name=
 Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
 /str

 schema.xml
 field name=id type=string indexed=true stored=true required=true
 /
field name=subject type=text indexed=true stored=true
 omitNorms=true/

  /fields
  uniqueKeyid/uniqueKey


 data-config.xml

 dataConfig
 dataSource  driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:@x.x.x.x: user=...  password=.../
 document name=companyQAIndex
 entity name=companyqa  pk=id query=select * from solr_test 
 field column=id name=id /
 field column=text name=subject /

 /entity
 /document
 /dataConfig

 Database Schema
 id  is the pk.
 There are only 2 rows in the table solr_test.

 Will anyone help me what I am wrong?

 Jae




Re: dataimport, both splitBy and dateTimeFormat

2008-10-16 Thread Shalin Shekhar Mangar
Hi David,

I think you meant RegexTransformer instead of NumberFormatTransformer.
Anyhow, the order in which the transformers are applied is the same as the
order in which you specify them.

So make sure your entity has
transformers=RegexTransformer,DateFormatTransformer.

On Thu, Oct 16, 2008 at 6:14 PM, David Smiley @MITRE.org
[EMAIL PROTECTED]wrote:


 I'm trying out the dataimport capability.  I have a column that is a series
 of dates separated by spaces like so:
 1996-00-00 1996-04-00
 And I'm trying to import it like so:
 field column=r_event_date splitBy=  dateTimeFormat=-MM-dd /

 However this fails and the stack trace suggests it is first trying to apply
 the dateTimeFormat before splitBy.  I think this is a bug... dataimport
 should apply DateFormatTransformer and NumberFormatTransformer last.

 ~ David Smiley
 --
 View this message in context:
 http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: dataimport, both splitBy and dateTimeFormat

2008-10-16 Thread David Smiley @MITRE.org

The wiki didn't mention I can specify multiple transformers.  BTW, it's
transformer (singular), not transformers.  I did mean both NFT and DFT
because I was speaking of the general case, not just mine in particular.  I
thought that the built-in transformers were always in-effect and so I
expected NFT,DFT to occur last.  Sorry if I wasn't clear.

Thanks for your help; it worked.

~ David


Shalin Shekhar Mangar wrote:
 
 Hi David,
 
 I think you meant RegexTransformer instead of NumberFormatTransformer.
 Anyhow, the order in which the transformers are applied is the same as the
 order in which you specify them.
 
 So make sure your entity has
 transformers=RegexTransformer,DateFormatTransformer.
 
 On Thu, Oct 16, 2008 at 6:14 PM, David Smiley @MITRE.org
 [EMAIL PROTECTED]wrote:
 

 I'm trying out the dataimport capability.  I have a column that is a
 series
 of dates separated by spaces like so:
 1996-00-00 1996-04-00
 And I'm trying to import it like so:
 field column=r_event_date splitBy=  dateTimeFormat=-MM-dd /

 However this fails and the stack trace suggests it is first trying to
 apply
 the dateTimeFormat before splitBy.  I think this is a bug... dataimport
 should apply DateFormatTransformer and NumberFormatTransformer last.

 ~ David Smiley
 --
 View this message in context:
 http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 
-- 
View this message in context: 
http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20016178.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dataimport, both splitBy and dateTimeFormat

2008-10-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
Thanks David,
I have updated the wiki documentation
http://wiki.apache.org/solr/DataImportHandler#transformer

The default transformers do not have any special privilege it is like
any normal user provided transformer.We just identified some commonly
found usecases and added transformers for that.

 Applying a transformer is not very 'cheap' it has to do extra checks
to know whether to apply or not.

On Fri, Oct 17, 2008 at 12:26 AM, David Smiley @MITRE.org
[EMAIL PROTECTED] wrote:

 The wiki didn't mention I can specify multiple transformers.  BTW, it's
 transformer (singular), not transformers.  I did mean both NFT and DFT
 because I was speaking of the general case, not just mine in particular.  I
 thought that the built-in transformers were always in-effect and so I
 expected NFT,DFT to occur last.  Sorry if I wasn't clear.

 Thanks for your help; it worked.

 ~ David


 Shalin Shekhar Mangar wrote:

 Hi David,

 I think you meant RegexTransformer instead of NumberFormatTransformer.
 Anyhow, the order in which the transformers are applied is the same as the
 order in which you specify them.

 So make sure your entity has
 transformers=RegexTransformer,DateFormatTransformer.

 On Thu, Oct 16, 2008 at 6:14 PM, David Smiley @MITRE.org
 [EMAIL PROTECTED]wrote:


 I'm trying out the dataimport capability.  I have a column that is a
 series
 of dates separated by spaces like so:
 1996-00-00 1996-04-00
 And I'm trying to import it like so:
 field column=r_event_date splitBy=  dateTimeFormat=-MM-dd /

 However this fails and the stack trace suggests it is first trying to
 apply
 the dateTimeFormat before splitBy.  I think this is a bug... dataimport
 should apply DateFormatTransformer and NumberFormatTransformer last.

 ~ David Smiley
 --
 View this message in context:
 http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.


 --
 View this message in context: 
 http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20016178.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: DataImport troubleshooting

2008-09-23 Thread Shalin Shekhar Mangar
Are there any exceptions in the log file when you start Solr?

On Tue, Sep 23, 2008 at 9:31 PM, KyleMorrison [EMAIL PROTECTED] wrote:


 I have searched the forum and the internet at large to find an answer to my
 simple problem, but have been unable. I am trying to get a simple
 dataimport
 to work, and have not been able to. I have Solr installed on an Apache
 server on Unix. I am able to commit and search for files using the usual
 Simple* tools. These files begin with add... and so on.

 On the data import, I have inserted
  requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
  str
 name=config/R1/home/shoshana/kyle/Documents/data-config.xml/str
/lst
  /requestHandler

 into solrconfig, and the data import looks like this:
 dataConfig
dataSource type=FileDataSource
 baseUrl=http://helix.ccb.sickkids.ca:8080/; encoding=UTF-8 /
document
entity name=page processor=XPathEntityProcessor stream=true
 forEach=/iProClassDatabase/iProClassEntry/
 url=/R1/home/shoshana/kyle/Documents/exampleIproResult.xml
field column=UniProtKB_Accession

 xpath=/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/UniProtKB/UniProtKB_Accession
field column=Nomenclature

 xpath=/iProClassDatabase/iProClassEntry/CROSS_REFERENCES/Enzyme_Function/EC/Nomenclature
 /
field column=PMID

 xpath=/iProClassDatabase/iProClassEntry/CROSS_REFERENCES/Bibliography/References/PMID
 /
field column=Sequence_Length
 xpath=/iProClassDatabase/iProClassEntry/SEQUENCE/Sequence_Length /
/entity
/document
 /dataConfig

 I apologize for the ugly xml. Nonetheless, when I go to
 http://host:8080/solr/dataimport, I get a 404, and when I go to
 http://host:8080/solr/admin/dataimport.jsp and try to debug, nothing
 happens. I have editted out the host name because I don't know if the
 employer would be ok with it. Any guidance?

 Thanks in advance,
 Kyle
 --
 View this message in context:
 http://www.nabble.com/DataImport-troubleshooting-tp19630990p19630990.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImport troubleshooting

2008-09-23 Thread KyleMorrison

Thank you for help. The problem was actually just stupidity on my part, as it
seems I was running the wrong startup and shutdown shells for the server,
and thus the server was getting restarted. I restarted the server and I can
at least access those pages. I'm getting some wonky output, but I assume
this will be sorted out.

Kyle



Shalin Shekhar Mangar wrote:
 
 Are there any exceptions in the log file when you start Solr?
 
 On Tue, Sep 23, 2008 at 9:31 PM, KyleMorrison [EMAIL PROTECTED] wrote:
 

 I have searched the forum and the internet at large to find an answer to
 my
 simple problem, but have been unable. I am trying to get a simple
 dataimport
 to work, and have not been able to. I have Solr installed on an Apache
 server on Unix. I am able to commit and search for files using the usual
 Simple* tools. These files begin with add... and so on.

 On the data import, I have inserted
  requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
  str
 name=config/R1/home/shoshana/kyle/Documents/data-config.xml/str
/lst
  /requestHandler

 into solrconfig, and the data import looks like this:
 dataConfig
dataSource type=FileDataSource
 baseUrl=http://helix.ccb.sickkids.ca:8080/; encoding=UTF-8 /
document
entity name=page processor=XPathEntityProcessor stream=true
 forEach=/iProClassDatabase/iProClassEntry/
 url=/R1/home/shoshana/kyle/Documents/exampleIproResult.xml
field column=UniProtKB_Accession

 xpath=/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/UniProtKB/UniProtKB_Accession
field column=Nomenclature

 xpath=/iProClassDatabase/iProClassEntry/CROSS_REFERENCES/Enzyme_Function/EC/Nomenclature
 /
field column=PMID

 xpath=/iProClassDatabase/iProClassEntry/CROSS_REFERENCES/Bibliography/References/PMID
 /
field column=Sequence_Length
 xpath=/iProClassDatabase/iProClassEntry/SEQUENCE/Sequence_Length /
/entity
/document
 /dataConfig

 I apologize for the ugly xml. Nonetheless, when I go to
 http://host:8080/solr/dataimport, I get a 404, and when I go to
 http://host:8080/solr/admin/dataimport.jsp and try to debug, nothing
 happens. I have editted out the host name because I don't know if the
 employer would be ok with it. Any guidance?

 Thanks in advance,
 Kyle
 --
 View this message in context:
 http://www.nabble.com/DataImport-troubleshooting-tp19630990p19630990.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/DataImport-troubleshooting-tp19630990p19635170.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImport

2008-06-11 Thread Shalin Shekhar Mangar
Hi Mihails,

The solr home is a directory which contains the conf/ and data/ folders. The
conf folder contains solrconfig.xml, schema.xml and other such configuration
files. The data/ folder contains the index files.

Other than adding the war file to tomcat, you also need to designate a
certain folder as solr home, so that solr knows from where to load it's
configuration. By default, solr searches for a folder named solr under the
current working directory (pwd) to use as home. There are other ways of
configuring it as given in solr wiki. Hope that helpes.

2008/6/11 Mihails Agafonovs [EMAIL PROTECTED]:

 I've already done that, but cannot access solr via web, and apache log
 says something wrong with solr home directory.
 -
 Couldn't start SOLR. Check solr/home property.
 -
  Quoting Chakraborty, Kishore K. : Mihails,
  Put the solr.war into the webapps directory and restart tomcat, then
 follow up the console and you'll see messages saying solr.war is
 getting deployed.
  Use a recent nightly build as that has the dataimport related patch
 included.
  Regards
  Kishore.
  -Original Message-
  From: Mihails Agafonovs [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, June 11, 2008 1:13 PM
  To: solr-user@lucene.apache.org
  Subject: Re: DataImport
  If I've copied the solr.war under tomcat/webapps directory, after
  restarting it the archive extracts itself and I get solr directory.
  Why do I need to set example-solr-home/solr, which is not in the
  /webapps directory, as home directory?
  Quoting Shalin Shekhar Mangar : No, the steps are as follows:
  1. Download the example-solr-home.jar from the DataImportHandler
  wiki page
  2. Extract it. You'll find a folder named example-solr-home and a
  solr.war
  file after extraction
  3. Copy the solr.war to tomcat_home/webapps. You don't need any
  other solr
  instance. This war is self-sufficient.
  4. You need to set the example-solr-home/solr folder as the solr
  home
  folder. For instructions on how to do that, look at
  http://wiki.apache.org/solr/SolrTomcat
  From the port number of the URL you are trying, it seems that you're
  using
  the Jetty supplied with Solr instead of Tomcat.
  2008/6/9 Mihails Agafonovs :
   I've placed the solr.war under the tomcat directory, restarted
  tomcat
   to deploy the solr.war. But still... there is no .jar, no folder
  named
   example-data-config, and hitting
   http://localhost:8983/solr/dataimport doesn't work.
   Do I need the original Solr instance to use this .war with?
Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. You
  can
   use the solr.war file. If you really
need a jar, you'll need to use the SOLR-469.patch at
http://issues.apache.org/jira/browse/SOLR-469 and build solr from
   source
after applying that patch.
2. The jar contains a folder named example-solr-home. Please
  check
   again.
Please let me know if you run into any problems.
2008/6/9 Mihails Agafonovs :
 Looked through the tutorial on data import, section Full
  Import
 Example.
 1) Where is this dataimport.jar? There is no such file in the
 extracted example-solr-home.jar.
 2) Use the solr folder inside example-data-config folder as
  your
 solr home. What does this mean? Anyway, there is no folder
 example-data-config.
  Ar cieņu, Mihails
--
Regards,
Shalin Shekhar Mangar.
Ar cieņu, Mihails
  
   Links:
   --
   [1] mailto:[EMAIL PROTECTED]
  
  --
  Regards,
  Shalin Shekhar Mangar.
  Ar cieņu, Mihails
  Links:
  --
  [1] mailto:[EMAIL PROTECTED]
  Ar cieņu, Mihails

 Links:
 --
 [1] mailto:[EMAIL PROTECTED]




-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImport

2008-06-11 Thread Mihails Agafonovs
I'm stuck...

I now have /tomcat5.5/webapps/solr (exploded solr.war),
/tomcat5.5/webapps/solr/solr-example/.
I've ran

export
JAVA_OPTS=$JAVA_OPTS-Dsolr.solr.home=/usr/share/tomcat5.5/webapps/solr/example/solr/
to make /example/solr/ as a home directory.

What am I doing wrong?

 Quoting Shalin Shekhar Mangar : Hi Mihails,
 The solr home is a directory which contains the conf/ and data/
folders. The
 conf folder contains solrconfig.xml, schema.xml and other such
configuration
 files. The data/ folder contains the index files.
 Other than adding the war file to tomcat, you also need to designate
a
 certain folder as solr home, so that solr knows from where to load
it's
 configuration. By default, solr searches for a folder named solr
under the
 current working directory (pwd) to use as home. There are other ways
of
 configuring it as given in solr wiki. Hope that helpes.
 2008/6/11 Mihails Agafonovs :
  I've already done that, but cannot access solr via web, and apache
log
  says something wrong with solr home directory.
  -
  Couldn't start SOLR. Check solr/home property.
  -
   Quoting Chakraborty, Kishore K. : Mihails,
   Put the solr.war into the webapps directory and restart tomcat,
then
  follow up the console and you'll see messages saying solr.war is
  getting deployed.
   Use a recent nightly build as that has the dataimport related
patch
  included.
   Regards
   Kishore.
   -Original Message-
   From: Mihails Agafonovs [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, June 11, 2008 1:13 PM
   To: solr-user@lucene.apache.org
   Subject: Re: DataImport
   If I've copied the solr.war under tomcat/webapps directory, after
   restarting it the archive extracts itself and I get solr
directory.
   Why do I need to set example-solr-home/solr, which is not in the
   /webapps directory, as home directory?
   Quoting Shalin Shekhar Mangar : No, the steps are as follows:
   1. Download the example-solr-home.jar from the DataImportHandler
   wiki page
   2. Extract it. You'll find a folder named example-solr-home and
a
   solr.war
   file after extraction
   3. Copy the solr.war to tomcat_home/webapps. You don't need any
   other solr
   instance. This war is self-sufficient.
   4. You need to set the example-solr-home/solr folder as the solr
   home
   folder. For instructions on how to do that, look at
   http://wiki.apache.org/solr/SolrTomcat
   From the port number of the URL you are trying, it seems that
you're
   using
   the Jetty supplied with Solr instead of Tomcat.
   2008/6/9 Mihails Agafonovs :
I've placed the solr.war under the tomcat directory, restarted
   tomcat
to deploy the solr.war. But still... there is no .jar, no
folder
   named
example-data-config, and hitting
http://localhost:8983/solr/dataimport doesn't work.
Do I need the original Solr instance to use this .war with?
 Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar.
You
   can
use the solr.war file. If you really
 need a jar, you'll need to use the SOLR-469.patch at
 http://issues.apache.org/jira/browse/SOLR-469 and build solr
from
source
 after applying that patch.
 2. The jar contains a folder named example-solr-home. Please
   check
again.
 Please let me know if you run into any problems.
 2008/6/9 Mihails Agafonovs :
  Looked through the tutorial on data import, section Full
   Import
  Example.
  1) Where is this dataimport.jar? There is no such file in
the
  extracted example-solr-home.jar.
  2) Use the solr folder inside example-data-config folder as
   your
  solr home. What does this mean? Anyway, there is no folder
  example-data-config.
   Ar cieņu, Mihails
 --
 Regards,
 Shalin Shekhar Mangar.
 Ar cieņu, Mihails
   
Links:
--
[1] mailto:[EMAIL PROTECTED]
   
   --
   Regards,
   Shalin Shekhar Mangar.
   Ar cieņu, Mihails
   Links:
   --
   [1] mailto:[EMAIL PROTECTED]
   Ar cieņu, Mihails
 
  Links:
  --
  [1] mailto:[EMAIL PROTECTED]
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 Ar cieņu, Mihails

Links:
--
[1] mailto:[EMAIL PROTECTED]


  1   2   >