subject:"getting started"

Re: Getting started with Solr

2015-03-01 Thread Baruch Kogan

OK, got it, works now.

Maybe you can advise on something more general?

I'm trying to use Solr to analyze html data retrieved with Nutch. I want to
crawl a list of webpages built according to a certain template, and analyze
certain fields in their HTML (identified by a span class and consisting of
a number,) then output results as csv to generate a list with the website's
domain and sum of the numbers in all the specified fields.

How should I set up the flow? Should I configure Nutch to only pull the
relevant fields from each page, then use Solr to add the integers in those
fields and output to a csv? Or should I use Nutch to pull in everything
from the relevant page and then use Solr to strip out the relevant fields
and process them as above? Can I do the processing strictly in Solr, using
the stuff found here
https://cwiki.apache.org/confluence/display/solr/Indexing+and+Basic+Data+Operations,
or should I use PHP through Solarium or something along those lines?

Your advice would be appreciated-I don't want to reinvent the bicycle.

Sincerely,

Baruch Kogan
Marketing Manager
Seller Panda http://sellerpanda.com
+972(58)441-3829
baruch.kogan at Skype

On Sun, Mar 1, 2015 at 9:17 AM, Baruch Kogan bar...@sellerpanda.com wrote:

 Thanks for bearing with me.

 I start Solr with `bin/solr start -e cloud' with 2 nodes. Then I get this:

 *Welcome to the SolrCloud example!*


 *This interactive session will help you launch a SolrCloud cluster on your
 local workstation.*

 *To begin, how many Solr nodes would you like to run in your local
 cluster? (specify 1-4 nodes) [2] *
 *Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.*

 *Please enter the port for node1 [8983] *
 *8983*
 *Please enter the port for node2 [7574] *
 *7574*
 *Cloning Solr home directory /home/ubuntu/crawler/solr/example/cloud/node1
 into /home/ubuntu/crawler/solr/example/cloud/node2*

 *Starting up SolrCloud node1 on port 8983 using command:*

 *solr start -cloud -s example/cloud/node1/solr -p 8983   *

 I then go to http://localhost:8983/solr/admin/cores and get the following:


 *This XML file does not appear to have any style information associated
 with it. The document tree is shown below.*

 *responselst name=responseHeaderint name=status0/intint
 name=QTime2/int/lstlst name=initFailures/lst name=statuslst
 name=testCollection_shard1_replica1str
 name=nametestCollection_shard1_replica1/strstr
 name=instanceDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica1//strstr
 name=dataDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica1/data//strstr
 name=configsolrconfig.xml/strstr name=schemaschema.xml/strdate
 name=startTime2015-03-01T06:59:12.296Z/datelong
 name=uptime46380/longlst name=indexint name=numDocs0/intint
 name=maxDoc0/intint name=deletedDocs0/intlong
 name=indexHeapUsageBytes0/longlong name=version1/longint
 name=segmentCount0/intbool name=currenttrue/boolbool
 name=hasDeletionsfalse/boolstr
 name=directoryorg.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica1/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@2a4f8f8b;
 maxCacheMB=48.0 maxMergeSizeMB=4.0)/strlst name=userData/long
 name=sizeInBytes71/longstr name=size71 bytes/str/lst/lstlst
 name=testCollection_shard1_replica2str
 name=nametestCollection_shard1_replica2/strstr
 name=instanceDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica2//strstr
 name=dataDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica2/data//strstr
 name=configsolrconfig.xml/strstr name=schemaschema.xml/strdate
 name=startTime2015-03-01T06:59:12.751Z/datelong
 name=uptime45926/longlst name=indexint name=numDocs0/intint
 name=maxDoc0/intint name=deletedDocs0/intlong
 name=indexHeapUsageBytes0/longlong name=version1/longint
 name=segmentCount0/intbool name=currenttrue/boolbool
 name=hasDeletionsfalse/boolstr
 name=directoryorg.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica2/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@2a4f8f8b;
 maxCacheMB=48.0 maxMergeSizeMB=4.0)/strlst name=userData/long
 name=sizeInBytes71/longstr name=size71 bytes/str/lst/lstlst
 name=testCollection_shard2_replica1str
 name=nametestCollection_shard2_replica1/strstr
 name=instanceDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard2_replica1//strstr
 name=dataDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard2_replica1/data//strstr
 name=configsolrconfig.xml/strstr name=schemaschema.xml/strdate
 name=startTime2015-03-01T06:59:12.596Z/datelong
 name=uptime46081/longlst name=indexint name=numDocs0/intint
 name=maxDoc0/intint name=deletedDocs0/intlong
 name=indexHeapUsageBytes0/longlong

Re: Getting started with Solr

2015-02-28 Thread Baruch Kogan

Thanks for bearing with me.

I start Solr with `bin/solr start -e cloud' with 2 nodes. Then I get this:

*Welcome to the SolrCloud example!*


*This interactive session will help you launch a SolrCloud cluster on your
local workstation.*

*To begin, how many Solr nodes would you like to run in your local cluster?
(specify 1-4 nodes) [2] *
*Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.*

*Please enter the port for node1 [8983] *
*8983*
*Please enter the port for node2 [7574] *
*7574*
*Cloning Solr home directory /home/ubuntu/crawler/solr/example/cloud/node1
into /home/ubuntu/crawler/solr/example/cloud/node2*

*Starting up SolrCloud node1 on port 8983 using command:*

*solr start -cloud -s example/cloud/node1/solr -p 8983   *

I then go to http://localhost:8983/solr/admin/cores and get the following:


*This XML file does not appear to have any style information associated
with it. The document tree is shown below.*

*responselst name=responseHeaderint name=status0/intint
name=QTime2/int/lstlst name=initFailures/lst name=statuslst
name=testCollection_shard1_replica1str
name=nametestCollection_shard1_replica1/strstr
name=instanceDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica1//strstr
name=dataDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica1/data//strstr
name=configsolrconfig.xml/strstr name=schemaschema.xml/strdate
name=startTime2015-03-01T06:59:12.296Z/datelong
name=uptime46380/longlst name=indexint name=numDocs0/intint
name=maxDoc0/intint name=deletedDocs0/intlong
name=indexHeapUsageBytes0/longlong name=version1/longint
name=segmentCount0/intbool name=currenttrue/boolbool
name=hasDeletionsfalse/boolstr
name=directoryorg.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica1/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@2a4f8f8b;
maxCacheMB=48.0 maxMergeSizeMB=4.0)/strlst name=userData/long
name=sizeInBytes71/longstr name=size71 bytes/str/lst/lstlst
name=testCollection_shard1_replica2str
name=nametestCollection_shard1_replica2/strstr
name=instanceDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica2//strstr
name=dataDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica2/data//strstr
name=configsolrconfig.xml/strstr name=schemaschema.xml/strdate
name=startTime2015-03-01T06:59:12.751Z/datelong
name=uptime45926/longlst name=indexint name=numDocs0/intint
name=maxDoc0/intint name=deletedDocs0/intlong
name=indexHeapUsageBytes0/longlong name=version1/longint
name=segmentCount0/intbool name=currenttrue/boolbool
name=hasDeletionsfalse/boolstr
name=directoryorg.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard1_replica2/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@2a4f8f8b;
maxCacheMB=48.0 maxMergeSizeMB=4.0)/strlst name=userData/long
name=sizeInBytes71/longstr name=size71 bytes/str/lst/lstlst
name=testCollection_shard2_replica1str
name=nametestCollection_shard2_replica1/strstr
name=instanceDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard2_replica1//strstr
name=dataDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard2_replica1/data//strstr
name=configsolrconfig.xml/strstr name=schemaschema.xml/strdate
name=startTime2015-03-01T06:59:12.596Z/datelong
name=uptime46081/longlst name=indexint name=numDocs0/intint
name=maxDoc0/intint name=deletedDocs0/intlong
name=indexHeapUsageBytes0/longlong name=version1/longint
name=segmentCount0/intbool name=currenttrue/boolbool
name=hasDeletionsfalse/boolstr
name=directoryorg.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard2_replica1/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@2a4f8f8b;
maxCacheMB=48.0 maxMergeSizeMB=4.0)/strlst name=userData/long
name=sizeInBytes71/longstr name=size71 bytes/str/lst/lstlst
name=testCollection_shard2_replica2str
name=nametestCollection_shard2_replica2/strstr
name=instanceDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard2_replica2//strstr
name=dataDir/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard2_replica2/data//strstr
name=configsolrconfig.xml/strstr name=schemaschema.xml/strdate
name=startTime2015-03-01T06:59:12.718Z/datelong
name=uptime45959/longlst name=indexint name=numDocs0/intint
name=maxDoc0/intint name=deletedDocs0/intlong
name=indexHeapUsageBytes0/longlong name=version1/longint
name=segmentCount0/intbool name=currenttrue/boolbool
name=hasDeletionsfalse/boolstr
name=directoryorg.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/home/ubuntu/crawler/solr/example/cloud/node1/solr/testCollection_shard2_replica2/data/index

Re: Getting started with Solr

2015-02-26 Thread Erik Hatcher

I’m sorry, I’m not following exactly.   

Somehow you no longer have a gettingstarted collection, but it is not clear how 
that happened.  

Could you post the exact script steps you used that got you this error?

What collections/cores does the Solr admin show you have?What are the 
results of http://localhost:8983/solr/admin/cores 
http://localhost:8983/solr/admin/cores ?

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On Feb 26, 2015, at 9:58 AM, Baruch Kogan bar...@sellerpanda.com wrote:
 
 Oh, I see. I used the start -e cloud command, then ran through a setup with
 one core and default options for the rest, then tried to post the json
 example again, and got another error:
 buntu@ubuntu-VirtualBox:~/crawler/solr$ bin/post -c gettingstarted
 example/exampledocs/*.json
 /usr/lib/jvm/java-7-oracle/bin/java -classpath
 /home/ubuntu/crawler/solr/dist/solr-core-5.0.0.jar -Dauto=yes
 -Dc=gettingstarted -Ddata=files org.apache.solr.util.SimplePostTool
 example/exampledocs/books.json
 SimplePostTool version 5.0.0
 Posting files to [base] url
 http://localhost:8983/solr/gettingstarted/update...
 Entering auto mode. File endings considered are
 xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
 POSTing file books.json (application/json) to [base]
 SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url:
 http://localhost:8983/solr/gettingstarted/update
 SimplePostTool: WARNING: Response: html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 404 Not Found/title
 /head
 bodyh2HTTP ERROR 404/h2
 pProblem accessing /solr/gettingstarted/update. Reason:
 preNot Found/pre/phr /ismallPowered by
 Jetty:///small/ibr/
 
 Sincerely,
 
 Baruch Kogan
 Marketing Manager
 Seller Panda http://sellerpanda.com
 +972(58)441-3829
 baruch.kogan at Skype
 
 On Thu, Feb 26, 2015 at 4:07 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:
 
 How did you start Solr?   If you started with `bin/solr start -e cloud`
 you’ll have a gettingstarted collection created automatically, otherwise
 you’ll need to create it yourself with `bin/solr create -c gettingstarted`
 
 
 —
 Erik Hatcher, Senior Solutions Architect
 http://www.lucidworks.com http://www.lucidworks.com/
 
 
 
 
 On Feb 26, 2015, at 4:53 AM, Baruch Kogan bar...@sellerpanda.com
 wrote:
 
 Hi, I've just installed Solr (will be controlling with Solarium and using
 to search Nutch queries.)  I'm working through the starting tutorials
 described here:
 https://cwiki.apache.org/confluence/display/solr/Running+Solr
 
 When I try to run $ bin/post -c gettingstarted
 example/exampledocs/*.json,
 I get a bunch of errors having to do
 with there not being a gettingstarted folder in /solr/. Is this normal?
 Should I create one?
 
 Sincerely,
 
 Baruch Kogan
 Marketing Manager
 Seller Panda http://sellerpanda.com
 +972(58)441-3829
 baruch.kogan at Skype

Re: Getting started with Solr

2015-02-26 Thread Baruch Kogan

Oh, I see. I used the start -e cloud command, then ran through a setup with
one core and default options for the rest, then tried to post the json
example again, and got another error:
buntu@ubuntu-VirtualBox:~/crawler/solr$ bin/post -c gettingstarted
example/exampledocs/*.json
/usr/lib/jvm/java-7-oracle/bin/java -classpath
/home/ubuntu/crawler/solr/dist/solr-core-5.0.0.jar -Dauto=yes
-Dc=gettingstarted -Ddata=files org.apache.solr.util.SimplePostTool
example/exampledocs/books.json
SimplePostTool version 5.0.0
Posting files to [base] url
http://localhost:8983/solr/gettingstarted/update...
Entering auto mode. File endings considered are
xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.json (application/json) to [base]
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url:
http://localhost:8983/solr/gettingstarted/update
SimplePostTool: WARNING: Response: html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 404 Not Found/title
/head
bodyh2HTTP ERROR 404/h2
pProblem accessing /solr/gettingstarted/update. Reason:
preNot Found/pre/phr /ismallPowered by
Jetty:///small/ibr/

Sincerely,

Baruch Kogan
Marketing Manager
Seller Panda http://sellerpanda.com
+972(58)441-3829
baruch.kogan at Skype

On Thu, Feb 26, 2015 at 4:07 PM, Erik Hatcher erik.hatc...@gmail.com
wrote:

 How did you start Solr?   If you started with `bin/solr start -e cloud`
 you’ll have a gettingstarted collection created automatically, otherwise
 you’ll need to create it yourself with `bin/solr create -c gettingstarted`


 —
 Erik Hatcher, Senior Solutions Architect
 http://www.lucidworks.com http://www.lucidworks.com/




  On Feb 26, 2015, at 4:53 AM, Baruch Kogan bar...@sellerpanda.com
 wrote:
 
  Hi, I've just installed Solr (will be controlling with Solarium and using
  to search Nutch queries.)  I'm working through the starting tutorials
  described here:
  https://cwiki.apache.org/confluence/display/solr/Running+Solr
 
  When I try to run $ bin/post -c gettingstarted
 example/exampledocs/*.json,
  I get a bunch of errors having to do
  with there not being a gettingstarted folder in /solr/. Is this normal?
  Should I create one?
 
  Sincerely,
 
  Baruch Kogan
  Marketing Manager
  Seller Panda http://sellerpanda.com
  +972(58)441-3829
  baruch.kogan at Skype

Re: Getting started with Solr

2015-02-26 Thread Erik Hatcher

How did you start Solr?   If you started with `bin/solr start -e cloud` you’ll 
have a gettingstarted collection created automatically, otherwise you’ll need 
to create it yourself with `bin/solr create -c gettingstarted`


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On Feb 26, 2015, at 4:53 AM, Baruch Kogan bar...@sellerpanda.com wrote:
 
 Hi, I've just installed Solr (will be controlling with Solarium and using
 to search Nutch queries.)  I'm working through the starting tutorials
 described here:
 https://cwiki.apache.org/confluence/display/solr/Running+Solr
 
 When I try to run $ bin/post -c gettingstarted example/exampledocs/*.json,
 I get a bunch of errors having to do
 with there not being a gettingstarted folder in /solr/. Is this normal?
 Should I create one?
 
 Sincerely,
 
 Baruch Kogan
 Marketing Manager
 Seller Panda http://sellerpanda.com
 +972(58)441-3829
 baruch.kogan at Skype

Getting started with Solr

2015-02-26 Thread Baruch Kogan

Hi, I've just installed Solr (will be controlling with Solarium and using
to search Nutch queries.)  I'm working through the starting tutorials
described here:
https://cwiki.apache.org/confluence/display/solr/Running+Solr

When I try to run $ bin/post -c gettingstarted example/exampledocs/*.json,
I get a bunch of errors having to do
with there not being a gettingstarted folder in /solr/. Is this normal?
Should I create one?

Sincerely,

Baruch Kogan
Marketing Manager
Seller Panda http://sellerpanda.com
+972(58)441-3829
baruch.kogan at Skype

Getting Started with Enterprise Search using Apache Solr

2014-07-28 Thread Xavier Morera

Hi. Most of the members here are already seasoned search professionals.
However I believe there may also be a few who joined because they want to
get started on search and IMHO, probably like you, Solr is the best way to
start.

Therefore I wanted to post a link to a course that I created on Getting
Started Enterprise Search using Apache Solr. For some it might be a good
way to start learning. If you are already a search professional maybe you
will not benefit greatly, but if you can provide feedback that will be
great as I want to create more trainings to help people get started on
search.

It is a Pluralsight training so if you are not a subscriber, just create a
trial account and you have 10 days to watch. If you have questions, let me
know. You can reach me through here or @xmorera in Twitter

Here is the course
http://pluralsight.com/training/Courses/TableOfContents/enterprise-search-using-apache-solr

PS: Pluralsight is also a great way to learn so I really recommend it.

https://www.linkedin.com/news?viewArticle=articleID=8578259352468791690gid=161594type=memberitem=5887568199951605762articleURL=http%3A%2F%2Fpluralsight%2Ecom%2Ftraining%2FCourses%2FTableOfContents%2Fenterprise-search-using-apache-solrurlhash=45UXgoback=%2Egde_161594_member_5887568199951605762
Getting Started with Enterprise Search using Apache Solr pluralsight.com

Search is one of the most misunderstood functionalities in the IT industry.
Even further, Enterprise Search used to be neither for the faint of heart,
nor for those with a thin wallet. However, since the introduction of Apache
Solr, the name of the game has changed. Don't leave home without it!

--
*Xavier Morera*
email: xav...@familiamorera.com
CR: +(506) 8849 8866
US: +1 (305) 600 4919
skype: xmorera

newbie getting started with solr

2013-11-07 Thread Palmer, Eric

Sorry if this is obvious (because it isn't for me)

I want to build a solr (4.5.1) + nutch (1.7.1) environment.  I'm doing this on 
amazon linux (I may put nutch on a separate server eventually).

Please let me know if my thinking is sound or off base

in the example folder are a lot of files and folders including the war file and 
start.jar

drwxr-xr-x   cloud-scripts
drwxr-xr-x   contexts
drwxr-xr-x   etc
drwxr-xr-x   example-DIH
drwxr-xr-x   exampledocs
drwxr-xr-x   example-schemaless
drwxr-xr-x   lib
drwxr-xr-x   logs
drwxr-xr-x   multicore
-rw-r--r--   README.txt
drwxr-xr-x   resources
drwxr-xr-x   solr
drwxr-xr-x   solr-webapp
-rw-r--r--   start.jar
drwxr-xr-x   webapps


I am creating a separate folder for the conf and data folders (on another disk) 
and placing these files in the conf file

schema-solr.xml (from nutch) renamed to schema.solr
solrconfig.xml

I will use the example folder and start.jar from that location. (is this okay)

Where do I set the collection name?

What else do I need to do to get a basic web page indexer built. (I'll work out 
the crawling later, I just want to be able to manually add some documents and 
query).  I'm trying to understand solr first and then will use nutch.

I have several books and have looked at the tutorial and other web sites. It 
seems they assume that I know where to begin when creating a new collection and 
customizing it.

Thanks in advance for your help.

--
Eric Palmer
Web Services
U of Richmond

To report technical issues, obtain technical support or make requests for 
enhancements please visit http://web.richmond.edu/contact/technical-support.html

Re: newbie getting started with solr

2013-11-07 Thread Tom Mortimer

Hi Eric,

Solr configuration can certainly be confusing at first. And for some time
after. :P

If you're running start.jar from the example folder (which is fine for
testing, and I've known some people to use it for production systems) then
the default solr home is example/solr.  This contains solr.xml, which
specifies where to find per-core configuration and data. (A core is
equivalent to a collection in a simple non-sharded setup).

For now, the easiest thing would be to use the default core in
example/solr/collection1. Copy your solrconfig.xml and schema.xml over the
ones in collection1/conf (backing up the originals for reference). Create
your data directory wherever you like and symlink it into collection1.

Now when you run $ java -jar start.jar in example/, you should be able to
access Solr at http://localhost:8983/solr/ , and add and search for
documents.

Hope that helps a bit!

Tom



On 7 November 2013 14:50, Palmer, Eric epal...@richmond.edu wrote:

 Sorry if this is obvious (because it isn't for me)

 I want to build a solr (4.5.1) + nutch (1.7.1) environment.  I'm doing
 this on amazon linux (I may put nutch on a separate server eventually).

 Please let me know if my thinking is sound or off base

 in the example folder are a lot of files and folders including the war
 file and start.jar

 drwxr-xr-x   cloud-scripts
 drwxr-xr-x   contexts
 drwxr-xr-x   etc
 drwxr-xr-x   example-DIH
 drwxr-xr-x   exampledocs
 drwxr-xr-x   example-schemaless
 drwxr-xr-x   lib
 drwxr-xr-x   logs
 drwxr-xr-x   multicore
 -rw-r--r--   README.txt
 drwxr-xr-x   resources
 drwxr-xr-x   solr
 drwxr-xr-x   solr-webapp
 -rw-r--r--   start.jar
 drwxr-xr-x   webapps


 I am creating a separate folder for the conf and data folders (on another
 disk) and placing these files in the conf file

 schema-solr.xml (from nutch) renamed to schema.solr
 solrconfig.xml

 I will use the example folder and start.jar from that location. (is this
 okay)

 Where do I set the collection name?

 What else do I need to do to get a basic web page indexer built. (I'll
 work out the crawling later, I just want to be able to manually add some
 documents and query).  I'm trying to understand solr first and then will
 use nutch.

 I have several books and have looked at the tutorial and other web sites.
 It seems they assume that I know where to begin when creating a new
 collection and customizing it.

 Thanks in advance for your help.

 --
 Eric Palmer
 Web Services
 U of Richmond

 To report technical issues, obtain technical support or make requests for
 enhancements please visit
 http://web.richmond.edu/contact/technical-support.html

Re: newbie getting started with solr

2013-11-07 Thread Alexandre Rafalovitch

Tried my book? It should explain that. You can see the collections with
examples in GitHub:
https://github.com/arafalov/solr-indexing-book/tree/master/published

Start from collection1.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Nov 7, 2013 at 4:50 PM, Palmer, Eric epal...@richmond.edu wrote:

 Sorry if this is obvious (because it isn't for me)

 I want to build a solr (4.5.1) + nutch (1.7.1) environment.  I'm doing
 this on amazon linux (I may put nutch on a separate server eventually).

 Please let me know if my thinking is sound or off base

 in the example folder are a lot of files and folders including the war
 file and start.jar

 drwxr-xr-x   cloud-scripts
 drwxr-xr-x   contexts
 drwxr-xr-x   etc
 drwxr-xr-x   example-DIH
 drwxr-xr-x   exampledocs
 drwxr-xr-x   example-schemaless
 drwxr-xr-x   lib
 drwxr-xr-x   logs
 drwxr-xr-x   multicore
 -rw-r--r--   README.txt
 drwxr-xr-x   resources
 drwxr-xr-x   solr
 drwxr-xr-x   solr-webapp
 -rw-r--r--   start.jar
 drwxr-xr-x   webapps


 I am creating a separate folder for the conf and data folders (on another
 disk) and placing these files in the conf file

 schema-solr.xml (from nutch) renamed to schema.solr
 solrconfig.xml

 I will use the example folder and start.jar from that location. (is this
 okay)

 Where do I set the collection name?

 What else do I need to do to get a basic web page indexer built. (I'll
 work out the crawling later, I just want to be able to manually add some
 documents and query).  I'm trying to understand solr first and then will
 use nutch.

 I have several books and have looked at the tutorial and other web sites.
 It seems they assume that I know where to begin when creating a new
 collection and customizing it.

 Thanks in advance for your help.

 --
 Eric Palmer
 Web Services
 U of Richmond

 To report technical issues, obtain technical support or make requests for
 enhancements please visit
 http://web.richmond.edu/contact/technical-support.html

Re: Re: Unable to getting started with SOLR

2013-09-18 Thread Furkan KAMACI

I suggest you to start from here:
http://wiki.apache.org/solr/HowToCompileSolr

15 Eylül 2013 Pazar tarihinde Erick Erickson erickerick...@gmail.com adlı
kullanıcı şöyle yazdı:
 If you're using the default jetty container, there's no log unless
 you set it up, the content is echoed to the screen.

 About a zillion people have downloaded this and started it
 running without issue, so you need to give us the exact
 steps you followed.

 If you checked the code out from SVN, you need to build it,
 go into solrhome/solr and execute

 ant example dist

 the dist bit isn't strictly necessary, but it builds the jars
 that you link to if you try to develop custom plugins etc.

 Best,
 Erick


 On Fri, Sep 13, 2013 at 3:56 AM, Rah1x raheel_itst...@yahoo.com wrote:

 I have the same issue can anyone tell me if they found a solution?



 --
 View this message in context:

http://lucene.472066.n3.nabble.com/Unable-to-getting-started-with-SOLR-tp3497276p4089761.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re: Unable to getting started with SOLR

2013-09-14 Thread Erick Erickson

If you're using the default jetty container, there's no log unless
you set it up, the content is echoed to the screen.

About a zillion people have downloaded this and started it
running without issue, so you need to give us the exact
steps you followed.

If you checked the code out from SVN, you need to build it,
go into solrhome/solr and execute

ant example dist

the dist bit isn't strictly necessary, but it builds the jars
that you link to if you try to develop custom plugins etc.

Best,
Erick


On Fri, Sep 13, 2013 at 3:56 AM, Rah1x raheel_itst...@yahoo.com wrote:

 I have the same issue can anyone tell me if they found a solution?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Unable-to-getting-started-with-SOLR-tp3497276p4089761.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re: Unable to getting started with SOLR

2013-09-13 Thread Rah1x

I have the same issue can anyone tell me if they found a solution?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-getting-started-with-SOLR-tp3497276p4089761.html
Sent from the Solr - User mailing list archive at Nabble.com.

Getting started with solr 4.2 and cassandra

2013-04-01 Thread Utkarsh Sengar

Hello,

I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
API, where data sits in cassandra.

Getting started with elasticsearch is pretty straight forward and I was
able to write an ES
riverhttp://www.elasticsearch.org/guide/reference/river/
which pulls data from cassandra and indexes it in ES within a day.

Now, I trying to implement something similar with solr and compare both of
them.

Getting started with
solr/examplehttp://lucene.apache.org/solr/4_2_0/tutorial.htmlwas
pretty easy and an example solr instance works. But the example folder
contains whole bunch of stuff which I am not sure if I need:
http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and
527 files

So my questions are:
1. How can I create a bare bone solr app up and running with minimum set of
configuration? (I will build over it when needed by taking reference from
/example)
2. What is a best practice to run solr in production? Am approach like this
jetty+nginx recommended:
http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/ ?

Once I am done setting up a simple solr instance:
3. What is the general practice to import data to solr? For now, I am
writing a python script which will read data in bulk from cassandra and
throw it to solr.

-- 
Thanks,
-Utkarsh

Re: Getting started with solr 4.2 and cassandra

2013-04-01 Thread Jack Krupansky

You might want to check out DataStax Enterprise, which actually integrates 
Cassandra and Solr. You keep the data in Cassandra, but as data is added and 
updated and deleted, the Solr index is automatically updated in parallel. 
You can add and update data and query using either the Cassandra API or the 
Solr API.


See:
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise

-- Jack Krupansky

-Original Message- 
From: Utkarsh Sengar

Sent: Monday, April 01, 2013 6:34 PM
To: solr-user@lucene.apache.org
Subject: Getting started with solr 4.2 and cassandra

Hello,

I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
API, where data sits in cassandra.

Getting started with elasticsearch is pretty straight forward and I was
able to write an ES
riverhttp://www.elasticsearch.org/guide/reference/river/
which pulls data from cassandra and indexes it in ES within a day.

Now, I trying to implement something similar with solr and compare both of
them.

Getting started with
solr/examplehttp://lucene.apache.org/solr/4_2_0/tutorial.htmlwas
pretty easy and an example solr instance works. But the example folder
contains whole bunch of stuff which I am not sure if I need:
http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and
527 files

So my questions are:
1. How can I create a bare bone solr app up and running with minimum set of
configuration? (I will build over it when needed by taking reference from
/example)
2. What is a best practice to run solr in production? Am approach like this
jetty+nginx recommended:
http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/ ?

Once I am done setting up a simple solr instance:
3. What is the general practice to import data to solr? For now, I am
writing a python script which will read data in bulk from cassandra and
throw it to solr.

--
Thanks,
-Utkarsh

Re: Getting started with solr 4.2 and cassandra

2013-04-01 Thread Utkarsh Sengar

Thanks for the reply. So DSE is one of the options and I am looking into
that too.
Although, before diving into solr+cassandra integration (which comes out of
the box with DSE).

I am just trying to setup a solr instance on my local machine without the
bloat the example solr instance has to offer. Any suggestions about that?

Thanks,
-Utkarsh

On Mon, Apr 1, 2013 at 4:00 PM, Jack Krupansky j...@basetechnology.comwrote:

You might want to check out DataStax Enterprise, which actually integrates
Cassandra and Solr. You keep the data in Cassandra, but as data is added
and updated and deleted, the Solr index is automatically updated in
parallel. You can add and update data and query using either the Cassandra
API or the Solr API.

See:
http://www.datastax.com/what-**we-offer/products-services/**
datastax-enterprisehttp://www.datastax.com/what-we-offer/products-services/datastax-enterprise

-- Jack Krupansky

-Original Message- From: Utkarsh Sengar
Sent: Monday, April 01, 2013 6:34 PM
To: solr-user@lucene.apache.org
Subject: Getting started with solr 4.2 and cassandra

Hello,

I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
API, where data sits in cassandra.

Getting started with elasticsearch is pretty straight forward and I was
able to write an ES
riverhttp://www.**elasticsearch.org/guide/**reference/river/http://www.elasticsearch.org/guide/reference/river/

which pulls data from cassandra and indexes it in ES within a day.

Now, I trying to implement something similar with solr and compare both of
them.

Getting started with
solr/examplehttp://lucene.**apache.org/solr/4_2_0/**tutorial.htmlhttp://lucene.apache.org/solr/4_2_0/tutorial.html
was

pretty easy and an example solr instance works. But the example folder
contains whole bunch of stuff which I am not sure if I need:
http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and
527 files

So my questions are:
1. How can I create a bare bone solr app up and running with minimum set of
configuration? (I will build over it when needed by taking reference from
/example)
2. What is a best practice to run solr in production? Am approach like this
jetty+nginx recommended:
http://sacharya.com/nginx-**proxy-to-jetty-for-java-apps/http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/?

Once I am done setting up a simple solr instance:
3. What is the general practice to import data to solr? For now, I am
writing a python script which will read data in bulk from cassandra and
throw it to solr.

--
Thanks,
-Utkarsh

Re: Getting started with solr 4.2 and cassandra

2013-04-01 Thread Jack Krupansky

The Solr example really is rather simple. Download, unzip, run, add data, 
query. It's really that simple. Make sure you are looking at the Solr 
tutorial:


http://lucene.apache.org/solr/4_2_0/tutorial.html

Download from here:
http://lucene.apache.org/solr/tutorial.html

-- Jack Krupansky

-Original Message- 
From: Utkarsh Sengar

Sent: Monday, April 01, 2013 7:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Getting started with solr 4.2 and cassandra

Thanks for the reply. So DSE is one of the options and I am looking into
that too.
Although, before diving into solr+cassandra integration (which comes out of
the box with DSE).

I am just trying to setup a solr instance on my local machine without the
bloat the example solr instance has to offer. Any suggestions about that?

Thanks,
-Utkarsh


On Mon, Apr 1, 2013 at 4:00 PM, Jack Krupansky 
j...@basetechnology.comwrote:



You might want to check out DataStax Enterprise, which actually integrates
Cassandra and Solr. You keep the data in Cassandra, but as data is added
and updated and deleted, the Solr index is automatically updated in
parallel. You can add and update data and query using either the Cassandra
API or the Solr API.

See:
http://www.datastax.com/what-**we-offer/products-services/**
datastax-enterprisehttp://www.datastax.com/what-we-offer/products-services/datastax-enterprise

-- Jack Krupansky

-Original Message- From: Utkarsh Sengar
Sent: Monday, April 01, 2013 6:34 PM
To: solr-user@lucene.apache.org
Subject: Getting started with solr 4.2 and cassandra


Hello,

I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
API, where data sits in cassandra.

Getting started with elasticsearch is pretty straight forward and I was
able to write an ES
riverhttp://www.**elasticsearch.org/guide/**reference/river/http://www.elasticsearch.org/guide/reference/river/


which pulls data from cassandra and indexes it in ES within a day.

Now, I trying to implement something similar with solr and compare both of
them.

Getting started with
solr/examplehttp://lucene.**apache.org/solr/4_2_0/**tutorial.htmlhttp://lucene.apache.org/solr/4_2_0/tutorial.html
was

pretty easy and an example solr instance works. But the example folder
contains whole bunch of stuff which I am not sure if I need:
http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and
527 files

So my questions are:
1. How can I create a bare bone solr app up and running with minimum set 
of

configuration? (I will build over it when needed by taking reference from
/example)
2. What is a best practice to run solr in production? Am approach like 
this

jetty+nginx recommended:
http://sacharya.com/nginx-**proxy-to-jetty-for-java-apps/http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/?

Once I am done setting up a simple solr instance:
3. What is the general practice to import data to solr? For now, I am
writing a python script which will read data in bulk from cassandra and
throw it to solr.

--
Thanks,
-Utkarsh





--
Thanks,
-Utkarsh

Re: Getting started with solr 4.2 and cassandra

2013-04-01 Thread Otis Gospodnetic

Hi,

Solr doesn't have anything like ES River.  DIH (DataImportHandler)
feels like the closest thing in Solr, though it's not quite the same
thing.  DIH pulls in data like a typical River does, but most people
have external indexers that push data into Solr using one of its
client libraries to talk to Solr, such as SolrJ.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Mon, Apr 1, 2013 at 6:34 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote:
 Hello,

 I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
 API, where data sits in cassandra.

 Getting started with elasticsearch is pretty straight forward and I was
 able to write an ES
 riverhttp://www.elasticsearch.org/guide/reference/river/
 which pulls data from cassandra and indexes it in ES within a day.

 Now, I trying to implement something similar with solr and compare both of
 them.

 Getting started with
 solr/examplehttp://lucene.apache.org/solr/4_2_0/tutorial.htmlwas
 pretty easy and an example solr instance works. But the example folder
 contains whole bunch of stuff which I am not sure if I need:
 http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and
 527 files

 So my questions are:
 1. How can I create a bare bone solr app up and running with minimum set of
 configuration? (I will build over it when needed by taking reference from
 /example)
 2. What is a best practice to run solr in production? Am approach like this
 jetty+nginx recommended:
 http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/ ?

 Once I am done setting up a simple solr instance:
 3. What is the general practice to import data to solr? For now, I am
 writing a python script which will read data in bulk from cassandra and
 throw it to solr.

 --
 Thanks,
 -Utkarsh

Re: Getting started with indexing a database

2012-01-15 Thread Rakesh Varna

Hi Mike,
   Can you try removing ' field column=doc_id name=DOC_ID / from the
nested entities? Just keep it in the top level entity.

Regards,
Rakesh Varna

On Wed, Jan 11, 2012 at 7:26 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Tue, Jan 10, 2012 at 7:09 AM, Mike O'Leary tmole...@uw.edu wrote:
 [...]
  My data-config.xml file looks like this:
 
  dataConfig
   dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
   url=jdbc:mysql://localhost:3306/bioscope user=db_user
 password=/
   document name=bioscope
 entity name=docs pk=doc_id query=SELECT doc_id, type FROM
 bioscope.docs
 deltaQuery=SELECT doc_id FROM bioscope.docs where
 last_modified  '${dataimporter.last_index_time}'
   field column=doc_id name=DOC_ID/
   field column=type name=DOC_TYPE/

 Your SELECT above does not include the field type

   entity name=codes pk=id query=SELECT id, origin, type, code
 FROM bioscope.codes WHERE doc_id='${docs.doc_id}'
  ^^ This should be: WHERE id=='${docs.doc_id}' as 'id' is
 what
you are selecting in this entity.

 Same issue for the second nested entity, i.e., replace doc_id= with id=

 Regards,
 Gora

Re: Getting started with indexing a database

2012-01-11 Thread Erick Erickson

I'm not going to be much help here since DIH is a mystery to me, I usually go
with a SolrJ program when DIH gets beyond simple cases. But have you
seen:
http://wiki.apache.org/solr/DataImportHandler#interactive

It's a tool that helps you see what's going on with your query.

Best
Erick

On Mon, Jan 9, 2012 at 8:39 PM, Mike O'Leary tmole...@uw.edu wrote:
 I am trying to index the contents of a database for the first time, and I am 
 only getting the primary key of the table represented by the top level entity 
 in my data-config.xml file to be indexed. The database I am starting with has 
 three tables:

 The table called docs has columns called doc_id, type and last_modified. The 
 primary key is doc_id.
 The table called codes has columns called id, doc_id, origin, type, code and 
 last_modified. The primary key is id. doc_id is a foreign key to the doc_id 
 column in the docs table.
 The table called texts has columns called id, doc_id, origin, type, text and 
 last_modified. The primary key is id. doc_id is a foreign key to the doc_id 
 column in the docs table.

 My data-config.xml file looks like this:

 dataConfig
  dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
              url=jdbc:mysql://localhost:3306/bioscope user=db_user 
 password=/
  document name=bioscope
    entity name=docs pk=doc_id query=SELECT doc_id, type FROM 
 bioscope.docs
            deltaQuery=SELECT doc_id FROM bioscope.docs where last_modified  
 '${dataimporter.last_index_time}'
      field column=doc_id name=DOC_ID/
      field column=type name=DOC_TYPE/
      entity name=codes pk=id query=SELECT id, origin, type, code FROM 
 bioscope.codes WHERE doc_id='${docs.doc_id}'
              deltaQuery=SELECT doc_id FROM bioscope.codes WHERE 
 last_modified  '${dataimporter.last_index_time}'
              parentDeltaQuery=SELECT doc_id from bioscope.docs WHERE 
 doc_id='${codes.doc_id}'
        field column=id name=CODE_ID/
        field column=doc_id name=DOC_ID/
        field column=origin name=CODE_ORIGIN/
        field column=type name=CODE_TYPE/
        field column=code name=CODE_VALUE/
      /entity
      entity name=notes pk=id query=SELECT id, origin, type, text FROM 
 bioscope.texts WHERE doc_id='${docs.doc_id}'
              deltaQuery=SELECT doc_id FROM bioscope.texts WHERE 
 last_modified  '${dataimporter.last_index_time}'
              parentDeltaQuery=SELECT doc_id from bioscope.docs WHERE 
 doc_id='${texts.doc_id}'
        field column=id name=NOTE_ID/
        field column=doc_id name=DOC_ID/
        field column=origin name=NOTE_ORIGIN/
        field column=type name=NOTE_TYPE/
        field column=text name=NOTE_TEXT/
      /entity
    /entity
  /document
 /dataConfig

 I added these lines to the schema.xml file:

 field name=DOC_ID type=string indexed=true omitNorms=true 
 stored=true/
 field name=DOC_TYPE type=string indexed=true omitNorms=true 
 stored=true/

 field name=CODE_ID type=string indexed=true omitNorms=true 
 stored=true/
 field name=CODE_ORIGIN type=string indexed=true omitNorms=true 
 stored=true/
 field name=CODE_TYPE type=string indexed=true omitNorms=true 
 stored=true/
 field name=CODE_VALUE type=string indexed=true omitNorms=true 
 stored=true/

 field name=NOTE_ID type=string indexed=true omitNorms=true 
 stored=true/
 field name=NOTE_ORIGIN type=string indexed=true omitNorms=true 
 stored=true/
 field name=NOTE_TYPE type=string indexed=true omitNorms=true 
 stored=true/
 field name=NOTE_TEXT type=text_ws indexed=true omitNorms=true 
 stored=true/

 ...

 uniqueKeyDOC_ID/uniqueKey
 defaultSearchFieldNOTE_TEXT/defaultSearchField

 When I run the full-import operation, only the DOC_ID values are written to 
 the index. When I run a program that dumps the index contents as an xml 
 string, the output looks like this:

 ?xml version=1.0 ?
 documents
  document
    field name=DOC_ID value=97634811
    /field
  /document
  document
    field name=DOC_ID value=97634910
    /field
  /document
 ...
 /documents

 Since this is new to me, I am sure that I have simply left something out or 
 specified something the wrong way, but I haven't been able to spot what I 
 have been doing wrong when I have gone over the configuration files that I am 
 using. Can anyone help me figure out why the other database contents are not 
 being indexed?
 Thanks,
 Mike

Re: Getting started with indexing a database

2012-01-11 Thread Gora Mohanty

On Tue, Jan 10, 2012 at 7:09 AM, Mike O'Leary tmole...@uw.edu wrote:
[...]
 My data-config.xml file looks like this:

 dataConfig
  dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
              url=jdbc:mysql://localhost:3306/bioscope user=db_user 
 password=/
  document name=bioscope
    entity name=docs pk=doc_id query=SELECT doc_id, type FROM 
 bioscope.docs
            deltaQuery=SELECT doc_id FROM bioscope.docs where last_modified  
 '${dataimporter.last_index_time}'
      field column=doc_id name=DOC_ID/
      field column=type name=DOC_TYPE/

Your SELECT above does not include the field type

      entity name=codes pk=id query=SELECT id, origin, type, code FROM 
 bioscope.codes WHERE doc_id='${docs.doc_id}'
 ^^ This should be: WHERE id=='${docs.doc_id}' as 'id' is what
you are selecting in this entity.

Same issue for the second nested entity, i.e., replace doc_id= with id=

Regards,
Gora

Getting started with indexing a database

2012-01-09 Thread Mike O'Leary

I am trying to index the contents of a database for the first time, and I am 
only getting the primary key of the table represented by the top level entity 
in my data-config.xml file to be indexed. The database I am starting with has 
three tables:

The table called docs has columns called doc_id, type and last_modified. The 
primary key is doc_id.
The table called codes has columns called id, doc_id, origin, type, code and 
last_modified. The primary key is id. doc_id is a foreign key to the doc_id 
column in the docs table.
The table called texts has columns called id, doc_id, origin, type, text and 
last_modified. The primary key is id. doc_id is a foreign key to the doc_id 
column in the docs table.

My data-config.xml file looks like this:

dataConfig
  dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://localhost:3306/bioscope user=db_user 
password=/
  document name=bioscope
entity name=docs pk=doc_id query=SELECT doc_id, type FROM 
bioscope.docs
deltaQuery=SELECT doc_id FROM bioscope.docs where last_modified  
'${dataimporter.last_index_time}'
  field column=doc_id name=DOC_ID/
  field column=type name=DOC_TYPE/
  entity name=codes pk=id query=SELECT id, origin, type, code FROM 
bioscope.codes WHERE doc_id='${docs.doc_id}'
  deltaQuery=SELECT doc_id FROM bioscope.codes WHERE last_modified 
 '${dataimporter.last_index_time}'
  parentDeltaQuery=SELECT doc_id from bioscope.docs WHERE 
doc_id='${codes.doc_id}'
field column=id name=CODE_ID/
field column=doc_id name=DOC_ID/
field column=origin name=CODE_ORIGIN/
field column=type name=CODE_TYPE/
field column=code name=CODE_VALUE/
  /entity
  entity name=notes pk=id query=SELECT id, origin, type, text FROM 
bioscope.texts WHERE doc_id='${docs.doc_id}'
  deltaQuery=SELECT doc_id FROM bioscope.texts WHERE last_modified 
 '${dataimporter.last_index_time}'
  parentDeltaQuery=SELECT doc_id from bioscope.docs WHERE 
doc_id='${texts.doc_id}'
field column=id name=NOTE_ID/
field column=doc_id name=DOC_ID/
field column=origin name=NOTE_ORIGIN/
field column=type name=NOTE_TYPE/
field column=text name=NOTE_TEXT/
  /entity
/entity
  /document
/dataConfig

I added these lines to the schema.xml file:

field name=DOC_ID type=string indexed=true omitNorms=true 
stored=true/
field name=DOC_TYPE type=string indexed=true omitNorms=true 
stored=true/

field name=CODE_ID type=string indexed=true omitNorms=true 
stored=true/
field name=CODE_ORIGIN type=string indexed=true omitNorms=true 
stored=true/
field name=CODE_TYPE type=string indexed=true omitNorms=true 
stored=true/
field name=CODE_VALUE type=string indexed=true omitNorms=true 
stored=true/

field name=NOTE_ID type=string indexed=true omitNorms=true 
stored=true/
field name=NOTE_ORIGIN type=string indexed=true omitNorms=true 
stored=true/
field name=NOTE_TYPE type=string indexed=true omitNorms=true 
stored=true/
field name=NOTE_TEXT type=text_ws indexed=true omitNorms=true 
stored=true/

...

uniqueKeyDOC_ID/uniqueKey
defaultSearchFieldNOTE_TEXT/defaultSearchField

When I run the full-import operation, only the DOC_ID values are written to the 
index. When I run a program that dumps the index contents as an xml string, the 
output looks like this:

?xml version=1.0 ?
documents
  document
field name=DOC_ID value=97634811
/field
  /document
  document
field name=DOC_ID value=97634910
/field
  /document
...
/documents

Since this is new to me, I am sure that I have simply left something out or 
specified something the wrong way, but I haven't been able to spot what I have 
been doing wrong when I have gone over the configuration files that I am using. 
Can anyone help me figure out why the other database contents are not being 
indexed?
Thanks,
Mike

Unable to getting started with SOLR

2011-11-10 Thread dsy99


Hi all,
 Sorry for the in convenience caused if to anyone but I need reply for
following.

I want to work in Solr and for the same I downloaded it and started to
follow the instruction provided in the Tutorial available at
http://lucene.apache.org/solr/tutorial.html; to execute some examples
first.
but when I tried to check whether Solr is running or not bye using
http://localhost:8983/solr/admin/; in the web browser I found the following
message.
  I will be thankful if one can suggest some solution for it.
 
 Message:


Unable to connect

  Firefox can't establish a connection to the server at localhost:8983.

 The site could be temporarily unavailable or too busy. Try again in a few 
moments.
  If you are unable to load any pages, check your computer's network
connection.
  If your computer or network is protected by a firewall or proxy, make sure
that Firefox is permitted to access the Web.
_

With Regds:
Divakar

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-getting-started-with-SOLR-tp3497276p3497276.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unable to getting started with SOLR

2011-11-10 Thread Per Newgro


Did you start the server (

*java -jar start.jar*

)? Was it successful? Have you checked the logs?

Am 10.11.2011 17:54, schrieb dsy99:

Hi all,
  Sorry for the in convenience caused if to anyone but I need reply for
following.

I want to work in Solr and for the same I downloaded it and started to
follow the instruction provided in the Tutorial available at
http://lucene.apache.org/solr/tutorial.html; to execute some examples
first.
but when I tried to check whether Solr is running or not bye using
http://localhost:8983/solr/admin/; in the web browser I found the following
message.
   I will be thankful if one can suggest some solution for it.

  Message:


 Unable to connect

   Firefox can't establish a connection to the server at localhost:8983.

  The site could be temporarily unavailable or too busy. Try again in a few
moments.
   If you are unable to load any pages, check your computer's network
connection.
   If your computer or network is protected by a firewall or proxy, make sure
that Firefox is permitted to access the Web.
_

With Regds:
Divakar

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-getting-started-with-SOLR-tp3497276p3497276.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unable to getting started with SOLR

2011-11-10 Thread Per Newgro

Sounds strange. Did you do java -jar start.jar on the console?

Am 10.11.2011 18:19, schrieb dsy99:

Yes I executed the server start.jar embedded in example folder but not
getting any message after that. I checked to logs also.it is empty.

On Thu, 10 Nov 2011 22:34:57 +0530 wrote

Did you start the server (

*java -jar start.jar*

)? Was it successful? Have you checked the logs?

Am 10.11.2011 17:54, schrieb dsy99:

Hi all,

Sorry for the in convenience caused if to anyone but I need reply for

following.

I want to work in Solr and for the same I downloaded it and started to

follow the instruction provided in the Tutorial available at

http://lucene.apache.org/solr/tutorial.html; to execute some examples

first.

but when I tried to check whether Solr is running or not bye using

http://localhost:8983/solr/admin/; in the web browser I found the following

message.

I will be thankful if one can suggest some solution for it.

Message:

Unable to connect

Firefox can't establish a connection to the server at localhost:8983.

The site could be temporarily unavailable or too busy. Try again in a few

moments.

If you are unable to load any pages, check your computer's network

connection.

If your computer or network is protected by a firewall or proxy, make sure

that Firefox is permitted to access the Web.

With Regds:

Divakar

View this message in context:
http://lucene.472066.n3.nabble.com/Unable-to-getting-started-with-SOLR-tp3497276p3497276.html
Sent from the Solr - User mailing list archive at Nabble.com.

If you reply to this email, your message will be added to the
discussion below:

http://lucene.472066.n3.nabble.com/Unable-to-getting-started-with-SOLR-tp3497276p3497310.html

To unsubscribe from Unable to getting started with SOLR, click
here.

See how NAML generates this email

--
View this message in context:
http://lucene.472066.n3.nabble.com/Unable-to-getting-started-with-SOLR-tp3497276p3497364.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re: Unable to getting started with SOLR

2011-11-10 Thread kingkong

Try replacing localhost with your domain or ip address and make sure the
port is open. Use the ps command to see if java is running.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-getting-started-with-SOLR-tp3497276p3497583.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Getting started with Velocity

2011-07-06 Thread Chip Calhoun

Thanks.  Is there any way to change what fields browse uses / asks for?  I've 
tried changing the code, and I'm clearly missing something.  I either get the 
same fields it was displaying before (and no search results) or I get something 
that doesn't work at all.

-Original Message-
From: Way Cool [mailto:way1.wayc...@gmail.com] 
Sent: Friday, July 01, 2011 5:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Getting started with Velocity

By default, browse is using the following config:
requestHandler name=/browse class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str

   !-- VelocityResponseWriter settings --
   str name=wtvelocity/str

   str name=v.templatebrowse/str
   str name=v.layoutlayout/str
   str name=titleSolritas/str

   str name=defTypeedismax/str
   str name=q.alt*:*/str
   str name=rows10/str
   str name=fl*,score/str
   str name=mlt.qf
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   /str
   str name=mlt.fltext,features,name,sku,id,manu,cat/str
   int name=mlt.count3/int

   str name=qf
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   /str

   str name=faceton/str
   str name=facet.fieldcat/str
   str name=facet.fieldmanu_exact/str
   str name=facet.queryipod/str
   str name=facet.queryGB/str
   str name=facet.mincount1/str
   str name=facet.pivotcat,inStock/str
   str name=facet.rangeprice/str
   int name=f.price.facet.range.start0/int
   int name=f.price.facet.range.end600/int
   int name=f.price.facet.range.gap50/int
   str name=f.price.facet.range.otherafter/str
   str name=facet.rangemanufacturedate_dt/str
   str
name=f.manufacturedate_dt.facet.range.startNOW/YEAR-10YEARS/str
   str name=f.manufacturedate_dt.facet.range.endNOW/str
   str name=f.manufacturedate_dt.facet.range.gap+1YEAR/str
   str name=f.manufacturedate_dt.facet.range.otherbefore/str
   str name=f.manufacturedate_dt.facet.range.otherafter/str


   !-- Highlighting defaults --
   str name=hlon/str
   str name=hl.fltext features name/str
   str name=f.name.hl.fragsize0/str
   str name=f.name.hl.alternateFieldname/str
 /lst
 arr name=last-components
   strspellcheck/str
 /arr
 !--
 str name=url-schemehttpx/str
 --
  /requestHandler

while the normal search is using the following:
requestHandler name=search class=solr.SearchHandler default=true
!-- default values for query parameters can be specified, these
 will be overridden by parameters in the request
  --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
 /lst
/requestHandler.

Just make sure you have those fields defined in browse also in your doc, 
otherwise change to not use dismax. :-)


On Fri, Jul 1, 2011 at 12:51 PM, Chip Calhoun ccalh...@aip.org wrote:

 I'm a Solr novice, so I hope I'm missing something obvious.  When I 
 run a search in the Admin view, everything works fine.  When I do the 
 same search in http://localhost:8983/solr/browse , I invariably get 0 
 results found.
  What am i missing?  Are these not supposed to be searching the same index?

 Thanks,
 Chip

Getting started with Velocity

2011-07-01 Thread Chip Calhoun

I'm a Solr novice, so I hope I'm missing something obvious.  When I run a 
search in the Admin view, everything works fine.  When I do the same search in 
http://localhost:8983/solr/browse , I invariably get 0 results found.  What 
am i missing?  Are these not supposed to be searching the same index?
 
Thanks,
Chip

Re: Getting started with Velocity

2011-07-01 Thread Way Cool

By default, browse is using the following config:
requestHandler name=/browse class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str

   !-- VelocityResponseWriter settings --
   str name=wtvelocity/str

   str name=v.templatebrowse/str
   str name=v.layoutlayout/str
   str name=titleSolritas/str

   str name=defTypeedismax/str
   str name=q.alt*:*/str
   str name=rows10/str
   str name=fl*,score/str
   str name=mlt.qf
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   /str
   str name=mlt.fltext,features,name,sku,id,manu,cat/str
   int name=mlt.count3/int

   str name=qf
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   /str

   str name=faceton/str
   str name=facet.fieldcat/str
   str name=facet.fieldmanu_exact/str
   str name=facet.queryipod/str
   str name=facet.queryGB/str
   str name=facet.mincount1/str
   str name=facet.pivotcat,inStock/str
   str name=facet.rangeprice/str
   int name=f.price.facet.range.start0/int
   int name=f.price.facet.range.end600/int
   int name=f.price.facet.range.gap50/int
   str name=f.price.facet.range.otherafter/str
   str name=facet.rangemanufacturedate_dt/str
   str
name=f.manufacturedate_dt.facet.range.startNOW/YEAR-10YEARS/str
   str name=f.manufacturedate_dt.facet.range.endNOW/str
   str name=f.manufacturedate_dt.facet.range.gap+1YEAR/str
   str name=f.manufacturedate_dt.facet.range.otherbefore/str
   str name=f.manufacturedate_dt.facet.range.otherafter/str


   !-- Highlighting defaults --
   str name=hlon/str
   str name=hl.fltext features name/str
   str name=f.name.hl.fragsize0/str
   str name=f.name.hl.alternateFieldname/str
 /lst
 arr name=last-components
   strspellcheck/str
 /arr
 !--
 str name=url-schemehttpx/str
 --
  /requestHandler

while the normal search is using the following:
requestHandler name=search class=solr.SearchHandler default=true
!-- default values for query parameters can be specified, these
 will be overridden by parameters in the request
  --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
 /lst
/requestHandler.

Just make sure you have those fields defined in browse also in your doc,
otherwise change to not use dismax. :-)


On Fri, Jul 1, 2011 at 12:51 PM, Chip Calhoun ccalh...@aip.org wrote:

 I'm a Solr novice, so I hope I'm missing something obvious.  When I run a
 search in the Admin view, everything works fine.  When I do the same search
 in http://localhost:8983/solr/browse , I invariably get 0 results found.
  What am i missing?  Are these not supposed to be searching the same index?

 Thanks,
 Chip

getting started

2011-06-16 Thread Mari Masuda

Hello,

I am new to Solr and am in the beginning planning stage of a large project and 
could use some advice so as not to make a huge design blunder that I will 
regret down the road.

Currently I have about 10 MySQL databases that store information about 
different archival collections.  For example, we have data and metadata about a 
political poster collection, a television program, documents and photographs of 
and about a famous author, etc.  My job is to work with the staff archivists to 
come up with a standard metadata template so the 10 databases can be 
consolidated into one.  

Currently the info in these databases is accessed through 10 different sets of 
PHP pages that were written a long time ago for PHP 4.  My plan is to write a 
new Java application that will handle both public display of the info as well 
as an administrative interface so that staff members can add or edit the 
records.

I have decided to use Solr as the search mechanism for this project.  Because 
the info in each of our 10 collections is slightly different (e.g., a record 
about a poster does not contain duration information, but a record about a TV 
show does) I was thinking it would be good to separate each collection's index 
into a separate Solr core so that commits coming from one collection do not bog 
down the other unrelated collections.  One reservation I have is that 
eventually we would like to be able to type in Iraq and find records across 
all of the collections at once instead of having to search each collection 
separately.  Although I don't know anything about it at this stage, I did 
Google sharding after reading someone's recent post on this list and it 
sounds like that may be a potential answer to my question.  Does anyone have 
any advice on how I should initially set up Solr for my situation?  I am slowly 
making my way through the wiki and RTFMing, but I wanted to see what the 
experts have to say because at this point I don't really know where to start.

Thank you very much,
Mari

Re: getting started

2011-06-16 Thread Jonathan Rochkind

On 6/16/2011 4:41 PM, Mari Masuda wrote:

One reservation I have is that eventually we would like to be able to type in Iraq and
find records across all of the collections at once instead of having to search each collection
separately. Although I don't know anything about it at this stage, I did Google
sharding after reading someone's recent post on this list and it sounds like that may
be a potential answer to my question.

So this kind of stuff can be tricky, but with that eventual requirement
I would NOT put these in seperate cores. Sharding isn't (IMO, if someone
disagrees, they will hopefully say so!) a good answer to searching
accross entirely different 'schemas', or avoiding frequent-commit issues
-- sharding is really just for scaling/performance when your index gets
very very large. (Which it doesn't sound like yours will be, but you can
deal with that as a separate issue if it becomes so).

If you're going to want to search across all the collections, put them
all in the same core. Either in the exact same indexed fields, or using
certain common indexed fields -- those common ones are the ones you'll
be able to search across all collections on. It's okay if some
collections have unique indexed fields too --- documents in the core
that don't belong to that collection just won't have any terms in that
indexed field that is only used by a certain collection, no problem.
(Then you can distribute this single core into shards if you need to for
performance reasons related to number of documents/size of index).

You're right to be thinking about the fact that very frequent commits
can be performance issues in Solr. But separating in different cores is
going to create more problems for yourself (if you want to be able to
search accross all collections), in an attempt to solve that one.
(Among other things, not every Solr feature works in a
distributed/sharded environment, it's just a more complicated and
somewhat less mature setup for Solr).

The way I deal with the frequent-commit issue is by NOT doing frequent
commits to my production Solr. Instead, I use Solr replication to have a
'master' Solr index that I do commits to whenever I want, and a 'slave'
Solr index that serves the production searches, and which only
replicates from master periodically -- not too often to be
too-frequent-commits. That seems to be a somewhat common solution, if
that use pattern works for you.

There are also some near real time features in more recent versions of
Solr, that I'm not very familiar with. (not sure if any are included in
the current latest release, or if they are all only still in the repo)
My sense is that they too only work for certain use patterns, they
aren't magic bullets for commit whatever you want as often as you want
to Solr. In general Solr isn't so great at very frequent major changes
to the index. Depending on exactly what sort of use pattern you are
predicting/planning for your commits, maybe people can give you advice
on how (or if) to do it.

But I personally don't think your idea of splitting your collections
(that you'll eventually want to search accross into a single search)
into shards is a good solution to frequent-commit issues. You'd be
complicating your setup and causing other problems for yourself, and not
really even entirely addressing the too-frequent-commit issue with that
setup.

Re: getting started

2011-06-16 Thread Sascha SZOTT

Hi Mari,

it depends ...

* How many records are stored in your MySQL databases?
* How often will updates occur?
* How many db records / index documents are changed per update?

I would suggest to start with a single Solr core first. Thereby, you can
concentrate on the basics and do not need to deal with more advanced
things like sharding. In case you encounter performance issues later on,
you can switch to a multi-core setup.

-Sascha

Mari Masuda wrote:

Hello,

I am new to Solr and am in the beginning planning stage of a large project and
could use some advice so as not to make a huge design blunder that I will
regret down the road.

Currently I have about 10 MySQL databases that store information about
different archival collections. For example, we have data and metadata about a
political poster collection, a television program, documents and photographs of
and about a famous author, etc. My job is to work with the staff archivists to
come up with a standard metadata template so the 10 databases can be
consolidated into one.

Currently the info in these databases is accessed through 10 different sets of
PHP pages that were written a long time ago for PHP 4. My plan is to write a
new Java application that will handle both public display of the info as well
as an administrative interface so that staff members can add or edit the
records.

I have decided to use Solr as the search mechanism for this project. Because the info in each of
our 10 collections is slightly different (e.g., a record about a poster does not contain duration
information, but a record about a TV show does) I was thinking it would be good to separate each
collection's index into a separate Solr core so that commits coming from one collection do not bog
down the other unrelated collections. One reservation I have is that eventually we would like to
be able to type in Iraq and find records across all of the collections at once instead
of having to search each collection separately. Although I don't know anything about it at this
stage, I did Google sharding after reading someone's recent post on this list and it
sounds like that may be a potential answer to my question. Does anyone have any advice on how I
should initially set up Solr for my situation? I am slowly making my way through the wiki and
RTFMing, but I wanted to see what

the experts have to say because at this point I don't really know where to
start.

Thank you very much,
Mari

72 matches

Mail list logo