track unused parts of config, schema

2012-06-08 Thread bryan rasmussen
Hi,

Our configs, schemas are quite big. Are there any tools, code snippets
in various languages, methodologies that people use in cleaning such
up?

For methodologies I might instead say things to look for that are
almost always there and almost never used so I can look at those
first.

Thanks,
Bryan Rasmussen


getTransformer error

2011-06-10 Thread bryan rasmussen
Hi,
I am trying to transforrm the results using xslt - I store my xslts in
conf/xslt/

I call them in the querystring with the parameters

wt=xslttr=result.xsl

And get back an error:

 getTransformer fails in getContentType

java.lang.RuntimeException: getTransformer fails in getContentType
...
Caused by: java.io.IOException: Unable to initialize Templates 'result.xsl'
...
Caused by: javax.xml.transform.TransformerConfigurationException:
Could not compile stylesheet

I'm supposing it is not an XSLT issue as I am able to run the
transformation via command line with Xalan.


Thanks,
Bryan Rasmussen


Re: getTransformer error

2011-06-10 Thread bryan rasmussen
Ok I guess it is nonetheless a stylesheet problem, as  a basic hello
world outputting stylesheet works.

thanks,
Bryan Rasmussen
On Fri, Jun 10, 2011 at 10:12 AM, bryan rasmussen
rasmussen.br...@gmail.com wrote:
 Hi,
 I am trying to transforrm the results using xslt - I store my xslts in
 conf/xslt/

 I call them in the querystring with the parameters

 wt=xslttr=result.xsl

 And get back an error:

  getTransformer fails in getContentType

 java.lang.RuntimeException: getTransformer fails in getContentType
 ...
 Caused by: java.io.IOException: Unable to initialize Templates 'result.xsl'
 ...
 Caused by: javax.xml.transform.TransformerConfigurationException:
 Could not compile stylesheet

 I'm supposing it is not an XSLT issue as I am able to run the
 transformation via command line with Xalan.


 Thanks,
 Bryan Rasmussen



solr 3.1 java.lang.NoClassDEfFoundError org/carrot2/core/ControllerFactory

2011-06-07 Thread bryan rasmussen
str name=hl.flall_text title/str
 !-- for this field, we want no fragmenting, just highlighting --
 str name=f.name.hl.fragsize150/str
  /lst
  arr name=last-components
strclustering/str
  /arr
/requestHandler



with the following command  to start solr
java -Dsolr.clustering.enabled=true
-Dsolr.solr.home=C:\projects\solrexample\solr -jar start.jar

Any idea as to why crusty is not working?

Thanks,
Bryan Rasmussen


clustering problems on 3.1

2011-06-07 Thread bryan rasmussen
I added the following to my configuration

  lib dir=c:/projects/solrtest/dist/
regex=apache-solr-clustering-.*\.jar /




requestHandler name=clusty class=solr.SearchHandler default=true
  lst name=defaults
str name=echoParamsexplicit/str

bool name=clusteringtrue/bool
str name=clustering.enginedefault/str
bool name=clustering.resultstrue/bool

!-- Fields to cluster on --
str name=carrot.titletitle/str
str name=carrot.snippetall_text/str
str name=hl.flall_text title/str
 !-- for this field, we want no fragmenting, just highlighting --
 str name=f.name.hl.fragsize150/str
  /lst
  arr name=last-components
strclustering/str
  /arr
/requestHandler


 searchComponent
class=org.apache.solr.handler.clustering.ClusteringComponent
name=clustering
  lst name=engine
str name=namedefault/str
str 
name=carrot.algorithmorg.carrot2.clustering.lingo.LingoClusteringAlgorithm/str

!-- Engine-specific parameters --
str name=LingoClusteringAlgorithm.desiredClusterCountBase20/str
  /lst
/searchComponent

which ended up with the message solr java.lang.NoClassDefFoundError:
org/carrot2/core/ControllerFactory
and whenever I did a request I got a 404 response back and

SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.SolrCore@14db38a4 (core1)
has a reference count of 1
appeared in my console.

Any suggestions?

Thanks,
Bryan Rasmussen


Re: Solr vs ElasticSearch

2011-06-01 Thread bryan rasmussen
Well, I recently chose it for a personal project and the deciding
thing for me was that it had nice integration to couchdb.

Thanks,
Bryan Rasmussen
On Wed, Jun 1, 2011 at 4:33 AM, Mark static.void@gmail.com wrote
 I've been hearing more and more about ElasticSearch. Can anyone give me a
 rough overview on how these two technologies differ. What are the
 strengths/weaknesses of each. Why would one choose one of the other?

 Thanks



Re: HTMLStripTransformer will remove the content in XML??

2011-05-27 Thread bryan rasmussen
I would expect that it doesn't understand CDATA and thinks of
everything between  and  as a 'tag'.

Best Regards,
Bryan Rasmussen

On Fri, May 27, 2011 at 9:41 AM, Ellery Leung elleryle...@be-o.com wrote:
 I have an XML string like this:



 ?xml version=1.0
 encoding=UTF-8?languageintl![CDATA[hello]]/intlloc![CDATA[solr
 ]]/loc/language



 By using HTMLStripTransformer, I expect to get 'hello,solr'.



 But actual this transformer will remove ALL THE TEXT INSIDE!



 Did I do something silly, or is it a bug?



 Thank you




Re: problem in setting field attribute in schema.xml

2011-05-26 Thread bryan rasmussen
 ya...but when i set indexed=false for a particular field, and i search as
 *:* then it will search all documents thats true, but what i think is it
 should not contain the field which i set as indexed=true.
 for example in a document fields are id, author,title. and i for author
 field i set indexed=false, then author should not be indexed and when i
 perform search  as *:* it should show all documents as
 doc
 string name= id id1/string
 string name=titlet1/string
 string name=authora1/string
 /doc

Well, since I am only a beginner myself I have to say what my
experience is - given that I have cleared my index, restarted,
reindexed with new schema settings and do a restart (which is probably
overdone)
and if the schema I indexed with says indexed = false, stored=true for
author and I search for author:a1 then I will get 0 results as I
expect and if I search for id:id1 then it will show
 doc
 string name= id id1/string
 string name=titlet1/string
 string name=authora1/string
 /doc
as  I expect - is this what is happening for you?

if it is happening and you are confused as to why I can't answer why
on a technical level as I assume it is based on design decisions which
I would agree don't seem sensible to me but is very probably based on
some underlying technical reason that I am not familiar with.

If you want to make sure that you do only see id and title in your
result then either set stored = false for author (although why would
you have a field that was both not stored and not indexed I don't
know) or use the fl parameter on your request to give the list of
fields you want returned - for example fl=id,title in the querystring
for the request should mean you would just see
string name= id id1/string
 string name=titlet1/string
and not
 string name=authora1/string

Best Regards,
Bryan Rasmussen


Re: problem in setting field attribute in schema.xml

2011-05-26 Thread bryan rasmussen
From my experience if it is indexing content that you have told it not
to index that is because you haven't cleared your old indexed content.
If you index something using schema version 5 which says indexed =
true and then you change it to indexed = false you have to delete your
old indexed content and reindex using the new schema, with lots of
stopping and restarting involved.

So - delete index, restart with new schema, index content with new schema.

Best Regards,
Bryan Rasmussen

On Thu, May 26, 2011 at 11:24 AM, Romi romijain3...@gmail.com wrote:
 thanks a lot bryan: it might be again the repetition, but i just want to know
 WHY it is indexing the field when it is indexed=false, what if
 stored=true, it is clearly written in documentation that a field is search
 able only if it is indexed=true, which surely make sense.
 and my application is not saying to do so i am just experimenting with solr
 to learn it. want to clear my concepts about indexing.

 Thanks
 Romi

 -
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2988066.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: problem in setting field attribute in schema.xml

2011-05-26 Thread bryan rasmussen
Well I'm probably being overly cautious here but its been my
experience that if I have a schema that says indexed = true on a field
and I change it to indexed = false I have to delete my index to get
rid of everything that was indexed with the old schema and I have to
restart to be able to index with the new schema.

I've had the situation a number of times where I have changed the
indexing rule for a field and not followed these steps and been
surprised when my index does not follow my expectations - and it seems
like you are experiencing the same thing.

Best Regards,
Bryan Rasmussen


Re: problem in setting field attribute in schema.xml

2011-05-26 Thread bryan rasmussen
On Thu, May 26, 2011 at 2:10 PM, Romi romijain3...@gmail.com wrote:
 did u mean when i set indexed=false and store=true, solr does not index
 the field's value but store its value as it is???
Yes. So you can get back the value of all stored fields even if your
search actually only finds results in indexed fields.

It does seem somewhat counter-intuitive.

Best Regards,
Bryan Rasmussen


Re: problem in setting field attribute in schema.xml

2011-05-25 Thread bryan rasmussen
if you never want to see a result for a field set stored = false.

Best Regards,
Bryan Rasmussen

On Wed, May 25, 2011 at 2:37 PM, Romi romijain3...@gmail.com wrote:
 In my schema.xml file i made a filed attribute indexed=false and stored=true.
 ie. i am not indexing this field but still in my search results i am getting
 values for this field, why is so any idea?

 -
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984126.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: problem in setting field attribute in schema.xml

2011-05-25 Thread bryan rasmussen
surely it indexes the data if you do indexed = true.

if you put some data in the field that is unique to that document and
then search do you get it? If not then it is because it is not
indexed. If you do a search for another field in the same document but
still get the non-indexed field shown it is because the non-indexed
field is stored.

Best Regards,
Bryan Rasmussen

On Wed, May 25, 2011 at 3:11 PM, Romi romijain3...@gmail.com wrote:
 if i do stored=false then it indexes the data but not shows the data in
 search result. but in my case i do not want to index the data for a field
 and to the my surprise even if i am doing indexed=false for this field, i
 am still able to get that data through the query *:* but not getting the
 data if i run filter query as field:value, its really confusing what solr is
 doing.

 -
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984239.html
 Sent from the Solr - User mailing list archive at Nabble.com.



I only want to return a fields value in certain cases, how is this done

2011-05-23 Thread bryan rasmussen
Let us say I have 3 fields I index
f1, f2, f3.

f1 and f2 are copied to f4.
f4 is the default searched field.


There is a value that is found in f2 and f3.

When I am searching in f3 I want to return only f3 and none  other.
when I am searching in f4 I do not want to return f3.
I only want to return f1 if it has the value that is found in the search.


Is this doable? Can you show me an example?

Thanks,
Bryan Rasmussen


I need to improve highlighting

2011-05-18 Thread bryan rasmussen
Hi,

If I do a search
http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
the lst name=highlighting subtree I get
arr name=all_text
−
str
Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
/str
/arr
/lst


What I need to do is to either

 1. Return all of all_text which should be possible by setting
hl.fragsize=0 but I still never go beyond the default for the field (I
can go less than 100 but not more)
2. Get a count of number of highlighted instances(preferable) or
return each highlighted text in a separate str element - so
strkongeriget/strstrkongeriget/str


thanks,
Bryan Rasmussen


Re: I need to improve highlighting

2011-05-18 Thread bryan rasmussen
 Bryan, on Q2 - what about using xpath like 'str/em' ?

How do I do that? The highlighting result, at least in the solr
installation I have (3. something) returns the em as escaped markup.
Is there an xpath parameter or configuration I can set for
highlighting, or a way to change the em elements to be actual
elements (hl.fomatter maybe?)

Thanks,
Bryan Rasmussen


 On Wed, May 18, 2011 at 2:25 PM, bryan rasmussen
 rasmussen.br...@gmail.com wrote:
 Hi,

 If I do a search
 http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
 the lst name=highlighting subtree I get
 arr name=all_text
 -
 str
 Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
 /str
 /arr
 /lst


 What I need to do is to either

  1. Return all of all_text which should be possible by setting
 hl.fragsize=0 but I still never go beyond the default for the field (I
 can go less than 100 but not more)
 2. Get a count of number of highlighted instances(preferable) or
 return each highlighted text in a separate str element - so
 strkongeriget/strstrkongeriget/str


 thanks,
 Bryan Rasmussen




Re: I need to improve highlighting

2011-05-18 Thread bryan rasmussen
yeah but you just got me to check again, what I thought was ignoring
my setting of hl.fragsize and always using the default ended up just
returning a smaller field higher ranked, so when I set it to 1000 and
saw the same as what I saw with 100 was the just the off chance that
there was only 100 to see in the first 10 results. funny.

thanks,
Bryan Rasmussen

On Wed, May 18, 2011 at 2:59 PM, Erick Erickson erickerick...@gmail.com wrote:
 Just checking, but have you tried setting
 hl.fragsize=very large number as suggested here:

 http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ?

 If that's not the problem, please show us the results of
 attaching debugQuery=on to the request, that may shed
 some light on the problem.

 Best
 Erick

 On Wed, May 18, 2011 at 8:25 AM, bryan rasmussen
 rasmussen.br...@gmail.com wrote:
 Hi,

 If I do a search
 http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
 the lst name=highlighting subtree I get
 arr name=all_text
 -
 str
 Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
 /str
 /arr
 /lst


 What I need to do is to either

  1. Return all of all_text which should be possible by setting
 hl.fragsize=0 but I still never go beyond the default for the field (I
 can go less than 100 but not more)
 2. Get a count of number of highlighted instances(preferable) or
 return each highlighted text in a separate str element - so
 strkongeriget/strstrkongeriget/str


 thanks,
 Bryan Rasmussen




indexing xml attributes?

2011-05-17 Thread bryan rasmussen
Hi,

As I understand it the DIH XPathEntityProcessor will not allow me to
index attributes - like so field column=ID xpath=/ARTIKEL/@ID
/

So if I want to index attributes I should pre-process the documents
into the format that Solr indexes normally and place the value of the
ID into a field?

Thanks,
Bryan Rasmussen


Re: indexing xml attributes?

2011-05-17 Thread bryan rasmussen
Ah never mind, I had to restart my instance in order for my changes to
the dataimporter to register.

thanks,
Bryan Rasmussen

On Tue, May 17, 2011 at 12:19 PM, bryan rasmussen
rasmussen.br...@gmail.com wrote:
 Hi,

 As I understand it the DIH XPathEntityProcessor will not allow me to
 index attributes - like so field column=ID     xpath=/ARTIKEL/@ID
 /

 So if I want to index attributes I should pre-process the documents
 into the format that Solr indexes normally and place the value of the
 ID into a field?

 Thanks,
 Bryan Rasmussen



How much does Solr enterprise server differ from the non Enterprise server?

2011-05-05 Thread bryan rasmussen
I am asking specifically because I am wondering if it is worth my time
too read the Enterprise server book or if there is too much of a
branch between the two?

If I read the book are there any parts of the book specifically that
won't be relevant?

Thanks,
Bryan Rasmussen


Re: How much does Solr enterprise server differ from the non Enterprise server?

2011-05-05 Thread bryan rasmussen
ok, I just saw the thing about syncing the version numbers.

Is there any information on these Solr 3.1 books? Publishers,
publication dates, website on them?

Mvh,
Bryan Rasmussen

On Thu, May 5, 2011 at 10:57 AM, Jan Høydahl jan@cominvent.com wrote:
 Hi,

 Solr IS an enterprise search server. And there is only one edition :)
 I'd wait a few more weeks until the Solr 3.1 books are available, and then 
 read up on it.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 On 5. mai 2011, at 09.37, bryan rasmussen wrote:

 I am asking specifically because I am wondering if it is worth my time
 too read the Enterprise server book or if there is too much of a
 branch between the two?

 If I read the book are there any parts of the book specifically that
 won't be relevant?

 Thanks,
 Bryan Rasmussen




testing of stemming

2011-04-19 Thread bryan rasmussen
Hi,

I was wondering if I have a large number of queries I want to test
stemming on if there is a free standing library I can just run it
against without having to do all the overhead of a http request?

Thanks,
Bryan Rasmussen


Re: testing of stemming

2011-04-19 Thread bryan rasmussen
maybe not a library but a command line tool would be good, something
that I can write code or do automation via script to test that when I
ask for the word virksomhed in the danish language that I can then see
that it will would also return virksomhederne and other variations.

I guess I was hoping for something similar to a wordnet of stems...

but at worst I would be fine with checking specifically against my
index - I just didn't necessarily want to automate the browser to do
it as I figured it would be extra performance intensive.

Thanks,
Bryan Rasmussen



On Tue, Apr 19, 2011 at 5:19 PM, Erick Erickson erickerick...@gmail.com wrote:
 I'm not sure what a free standing library would look like. Do you
 want it to check that all the terms in your index are stemmed
 correctly (or at least as expected)?

 You have a bunch of queries. How would such a library test them
 against your corpus?

 There's not enough information here to give a meaningful answer

 Best
 Erick

 On Tue, Apr 19, 2011 at 11:15 AM, bryan rasmussen
 rasmussen.br...@gmail.com wrote:
 Hi,

 I was wondering if I have a large number of queries I want to test
 stemming on if there is a free standing library I can just run it
 against without having to do all the overhead of a http request?

 Thanks,
 Bryan Rasmussen




Re: testing of stemming

2011-04-19 Thread bryan rasmussen
that looks like a good starting point,

thanks,
bryan rasmussen

2011/4/19 François Schiettecatte fschietteca...@gmail.com:
 I would start here:

        http://snowball.tartarus.org/

 François

 On Apr 19, 2011, at 11:15 AM, bryan rasmussen wrote:

 Hi,

 I was wondering if I have a large number of queries I want to test
 stemming on if there is a free standing library I can just run it
 against without having to do all the overhead of a http request?

 Thanks,
 Bryan Rasmussen




all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
Hi,
I am starting my solr instance with the command java
-Dsolr.solr.home=./test1/solr/ -jar start.jar
where I have a solr.xml file
?xml version=1.0 encoding=UTF-8 standalone=yes?
solr sharedLib=lib persistent=true
cores adminPath=/admin/cores
core default=false instanceDir=tester name=tester/
/cores
/solr

In the folder tester I have configurations - adapted from the rss examples

DataImporter.xml
dataConfig
 dataSource name=myfilereader type=FileDataSource/
   document
 entity name=jc rootEntity=false dataSource=null
 processor=FileListEntityProcessor
 fileName=^.*\.xml$ recursive=true
 baseDir=/projects/solrtest/transformedimport
 
   entity name=x rootEntity=true
   dataSource=myfilereader
   processor=XPathEntityProcessor
   url=${jc.fileAbsolutePath}
   stream=false forEach=/ARTIKEL
   
transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,LogTransformer
   logTemplate=processing ${jc.fileAbsolutePath}
   logLevel=info
   


 field column=title xpath=/DOKTITEL/OVERSKRIFT1 /
 field column=text  xpath=/AKROP/TXT  /



   /entity
 /entity
   /document
  /dataConfig

solrconfig.xml - same as the rss example only removed elevate components.

schema.xml


 fields
field name=title type=text indexed=true stored=true /
field name=txt type=text indexed=true stored=true /
field name=all_text type=text indexed=true stored=true
multiValued=true /
copyField source=title dest=all_text /
copyField source=txt dest=all_text /
/fields

removed the uniqueKey constraint.

When I go to http://localhost:8983/solr/tester/admin/
I get the admin page.
When I run http://localhost:8983/solr/tester/dataimport?command=full-import
it says

response
−
lst name=responseHeader
int name=status0/int
int name=QTime16/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configdataimporter.xml/str
/lst
/lst
str name=commandfull-import/str
str name=statusidle/str
str name=importResponse/
lst name=statusMessages/
−
str name=WARNING
This response format is experimental.  It is likely to change in the future.
/str
/response
When I look at the log of that it says a bunch of stuff like:

INFO: processing c:\projects\solrtest\transformed\1.xml
org.apache.solr.common.util.XMLErrorLogger report
WARNING: XmL parser reported xml declaration in null, line 1, column
38: Inconsistent text encoding; declared as utf-8 in xml
declaration, application had passed Cp1252

Here is one of the processed documents

  ?xml version=1.0 encoding=utf-8 ?
- ARTIKEL ID=MM2010ADMINISTRATIONSYDELSER
- DOKTITEL
  OVERSKRIFT1Administrationsydelser (MomsManual)/OVERSKRIFT1
  /DOKTITEL
- AKROP
  TXTAdministrationsydelser er momspligtige. Dette gælder også når
de faktureres koncerninternt, f.eks. fra et moderselskab
(holdingselskab) til et datterselskab./TXT
  TXTDer er fradragsret for moms vedrørende køb af
administrationsydelser i samme omfang, som virksomheden kan fratrække
momsen af øvrige fællesomkostninger./TXT
  TXTHvis administrationsydelser faktureres på tværs af
landegrænserne, f.eks. indenfor internationale koncerner, kan der
gælde forskellige principper for momsberegningen i de enkelte
EU-lande. Hvis en administrationsydelse faktureres fra Danmark til et
datterselskab i et andet land, herunder også i andre EU-lande, er det
myndighedernes holdning, at der skal faktureres med dansk moms./TXT
  TXTHvis en administrationsydelse faktureres mellem et selskab og
dets filial/-er, skal faktura altid udstedes uden moms. Handel med
ydelser mellem et selskab og dets filial/-er anses ikke for at udgøre
momspligtige transaktioner./TXT
  TXTORegler/TXTO
- TXT
  LR IDREF=LBKG2005966.§15 CREATOR=autolink TARGETTYPE=RELML § 15/LR
  /TXT
  /AKROP
  /ARTIKEL

If I search for the text Administrationsydelser
http://localhost:8983/solr/tester/select/?q=Administrationsydelserversion=2.2start=0rows=10indent=on
I get

response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=qAdministrationsydelser/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=0 start=0/
/response

There is a segments.gen and a segments_4 file in my index but nothing
else. Tried looking with Luke but it seems not to be compatible with
the newest versions of Lucene...

version of solr is 3.1.0

Thanks,
Bryan Rasmussen


Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
Also if I check
solr/tester/dataimport it responds:

response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configdataimporter.xml/str
/lst
/lst
str name=statusidle/str
str name=importResponse/
−
lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1634/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2011-04-18 11:55:47/str
−
str name=
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
/str
str name=Committed2011-04-18 11:55:48/str
str name=Optimized2011-04-18 11:55:48/str
str name=Total Documents Processed0/str
str name=Time taken 0:0:0.922/str
/lst
−
str name=WARNING
This response format is experimental.  It is likely to change in the future.
/str
/response


On Mon, Apr 18, 2011 at 11:46 AM, bryan rasmussen
rasmussen.br...@gmail.com wrote:
 Hi,
 I am starting my solr instance with the command java
 -Dsolr.solr.home=./test1/solr/ -jar start.jar
 where I have a solr.xml file
 ?xml version=1.0 encoding=UTF-8 standalone=yes?
 solr sharedLib=lib persistent=true
        cores adminPath=/admin/cores
                core default=false instanceDir=tester name=tester/
        /cores
 /solr

 In the folder tester I have configurations - adapted from the rss examples

 DataImporter.xml
 dataConfig
  dataSource name=myfilereader type=FileDataSource/
   document
     entity name=jc rootEntity=false dataSource=null
             processor=FileListEntityProcessor
             fileName=^.*\.xml$ recursive=true
             baseDir=/projects/solrtest/transformedimport
             
       entity name=x rootEntity=true
               dataSource=myfilereader
               processor=XPathEntityProcessor
               url=${jc.fileAbsolutePath}
               stream=false forEach=/ARTIKEL
               
 transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,LogTransformer
               logTemplate=processing ${jc.fileAbsolutePath}
               logLevel=info
               


         field column=title     xpath=/DOKTITEL/OVERSKRIFT1 /
         field column=text      xpath=/AKROP/TXT  /



       /entity
     /entity
   /document
  /dataConfig

 solrconfig.xml - same as the rss example only removed elevate components.

 schema.xml


  fields
        field name=title type=text indexed=true stored=true /
        field name=txt type=text indexed=true stored=true /
        field name=all_text type=text indexed=true stored=true
 multiValued=true /
        copyField source=title dest=all_text /
        copyField source=txt dest=all_text /
 /fields

 removed the uniqueKey constraint.

 When I go to http://localhost:8983/solr/tester/admin/
 I get the admin page.
 When I run http://localhost:8983/solr/tester/dataimport?command=full-import
 it says

 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime16/int
 /lst
 −
 lst name=initArgs
 −
 lst name=defaults
 str name=configdataimporter.xml/str
 /lst
 /lst
 str name=commandfull-import/str
 str name=statusidle/str
 str name=importResponse/
 lst name=statusMessages/
 −
 str name=WARNING
 This response format is experimental.  It is likely to change in the future.
 /str
 /response
 When I look at the log of that it says a bunch of stuff like:

 INFO: processing c:\projects\solrtest\transformed\1.xml
 org.apache.solr.common.util.XMLErrorLogger report
 WARNING: XmL parser reported xml declaration in null, line 1, column
 38: Inconsistent text encoding; declared as utf-8 in xml
 declaration, application had passed Cp1252

 Here is one of the processed documents

  ?xml version=1.0 encoding=utf-8 ?
 - ARTIKEL ID=MM2010ADMINISTRATIONSYDELSER
 - DOKTITEL
  OVERSKRIFT1Administrationsydelser (MomsManual)/OVERSKRIFT1
  /DOKTITEL
 - AKROP
  TXTAdministrationsydelser er momspligtige. Dette gælder også når
 de faktureres koncerninternt, f.eks. fra et moderselskab
 (holdingselskab) til et datterselskab./TXT
  TXTDer er fradragsret for moms vedrørende køb af
 administrationsydelser i samme omfang, som virksomheden kan fratrække
 momsen af øvrige fællesomkostninger./TXT
  TXTHvis administrationsydelser faktureres på tværs af
 landegrænserne, f.eks. indenfor internationale koncerner, kan der
 gælde forskellige principper for momsberegningen i de enkelte
 EU-lande. Hvis en administrationsydelse faktureres fra Danmark til et
 datterselskab i et andet land, herunder også i andre EU-lande, er det
 myndighedernes holdning, at der skal faktureres med dansk moms./TXT
  TXTHvis en administrationsydelse faktureres mellem et selskab og
 dets filial/-er, skal faktura altid udstedes uden moms. Handel med
 ydelser mellem et selskab og dets filial/-er anses ikke for at udgøre
 momspligtige transaktioner./TXT
  TXTORegler/TXTO
 - TXT
  LR IDREF=LBKG2005966.§15 CREATOR=autolink TARGETTYPE=RELML § 15/LR
  /TXT
  /AKROP
  /ARTIKEL

 If I search for the text Administrationsydelser
 http://localhost:8983/solr/tester/select

Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
hah, actually I tried with complete xpaths earlier but they weren't
working but that was because I had made a mistake in my foreach.. and
then I decided that probably the foreach and the other xpaths were
being concatenated.

however it is not absolutely correct yet, if I run
http://localhost:8983/solr/tester/dataimport?command=full-importdebug=true
I get

response
−
lst name=responseHeader
int name=status0/int
int name=QTime422/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configdataimporter.xml/str
/lst
/lst
str name=commandfull-import/str
str name=modedebug/str
−
arr name=documents
−
lst
−
arr name=title
strForord (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAbonnementsudgifter (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAb skf (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAcontobeløb (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdgang til arrangementer (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdministration, fast ejendom (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdministrationsfællesskab (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdministrationsydelser (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdsl (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAdvokatomkostninger (MomsManual)/str
/arr
/lst
−
lst
−
arr name=title
strAfbestillingsgebyrer (MomsManual)/str
/arr
/lst
/arr
lst name=verbose-output/
str name=statusidle/str
str name=importResponseConfiguration Re-loaded sucessfully/str
−
lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched22/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2011-04-18 12:26:52/str
−
str name=
Indexing completed. Added/Updated: 11 documents. Deleted 0 documents.
/str
str name=Total Documents Processed11/str
str name=Time taken 0:0:0.406/str
/lst
−
str name=WARNING
This response format is experimental.  It is likely to change in the future.
/str
/response

so the title fields
 field column=title xpath=/ARTIKEL/DOKTITEL/OVERSKRIFT1 /
are being added but not the the text fields
 field column=text  xpath=/ARTIKEL/AKROP/TXT  /

The most salient difference between these two is that will be more
than one TXT, I just tried with the parent element however and it
didn't do anything.

But when I do a search for MomsManual which you can see is in all the
title fields
I get
response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=qMomsManual/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=0 start=0/
/response

:(

Thanks,
Bryan Rasmussen

On Mon, Apr 18, 2011 at 12:23 PM, lboutros boutr...@gmail.com wrote:
 did you try with the comlete xpath ?

 field column=title     xpath=/ARTIKEL/DOKTITEL/OVERSKRIFT1 /
 field column=text      xpath=/ARTIKEL/AKROP/TXT  /

 Ludovic.

 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833798.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldtype

 /types


 fields
field name=title type=text indexed=true stored=true /
field name=txt type=text indexed=true stored=true /
field name=all_text type=text indexed=true stored=true
multiValued=true /
copyField source=title dest=all_text /
copyField source=txt dest=all_text /
/fields
 defaultSearchFieldall_text/defaultSearchField
 solrQueryParser defaultOperator=AND/

/schema


the protwords.txt and stopwords.txt are also from the rss example.

thanks,
Bryan Rasmussen

On Mon, Apr 18, 2011 at 12:55 PM, lboutros boutr...@gmail.com wrote:
 If a document contains multiple 'txt' fields, it should be marked as
 'multiValued'.

 field name=txt type=text indexed=true stored=true
 multiValued=true/

 But if I'm understanding well, you also tried this ? :

 field column=text      xpath=/ARTIKEL/AKROP  /

 And for your search (MomsManual), could you give us your analyzer from the
 schema.xml please ?

 Ludovic.

 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/all-searches-return-0-hits-what-have-I-done-wrong-tp2833706p2833876.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: all searches return 0 hits - what have I done wrong?

2011-04-18 Thread bryan rasmussen
Hmm, ok I see the schema was wrong - I was calling the TEXT field
txt... also now I am getting results on my title search after another
restart and reindex - setting the TXT fields to be multiValued.

Thanks,
Bryan Rasmussen

On Mon, Apr 18, 2011 at 1:09 PM, bryan rasmussen
rasmussen.br...@gmail.com wrote:
 well basically I copied out the RSS example as I figured that would be
 the closest to what I wanted to do

 ?xml version=1.0 encoding=UTF-8 ?
 schema name=tester version=1.1
  types
    fieldType name=string class=solr.StrField
 sortMissingLast=true omitNorms=true/
    fieldType name=boolean class=solr.BoolField
 sortMissingLast=true omitNorms=true/
    fieldType name=integer class=solr.IntField omitNorms=true/
    fieldType name=long class=solr.LongField omitNorms=true/
    fieldType name=float class=solr.FloatField omitNorms=true/
    fieldType name=double class=solr.DoubleField omitNorms=true/
    fieldType name=sint class=solr.SortableIntField
 sortMissingLast=true omitNorms=true/
    fieldType name=slong class=solr.SortableLongField
 sortMissingLast=true omitNorms=true/
    fieldType name=sfloat class=solr.SortableFloatField
 sortMissingLast=true omitNorms=true/
    fieldType name=sdouble class=solr.SortableDoubleField
 sortMissingLast=true omitNorms=true/
    fieldType name=date class=solr.DateField
 sortMissingLast=true omitNorms=true/
    fieldType name=random class=solr.RandomSortField indexed=true /
    fieldType name=text_ws class=solr.TextField 
 positionIncrementGap=100
      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory/
      /analyzer
    /fieldType
    fieldType name=text class=solr.TextField positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
        filter class=solr.PorterStemFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
        filter class=solr.PorterStemFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType


    !-- Less flexible matching, but less false matches.  Probably not
 ideal for product names,
         but may be good for SKUs.  Can insert dashes in the wrong
 place and still match. --
    fieldType name=textTight class=solr.TextField
 positionIncrementGap=100 
      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=false/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
        filter class=solr.EnglishMinimalStemFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType

    fieldType name=alphaOnlySort class=solr.TextField
 sortMissingLast=true omitNorms=true
      analyzer
        tokenizer class=solr.KeywordTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory /
        !-- The TrimFilter removes any leading or trailing whitespace --
        filter class=solr.TrimFilterFactory /
        filter class=solr.PatternReplaceFilterFactory
                pattern=([^a-z]) replacement= replace=all
        /
      /analyzer
    /fieldType

    fieldtype name=ignored stored=false indexed=false
 class=solr.StrField /

    fieldtype name=html stored=true indexed=true class=solr.TextField
      analyzer type=index
        charFilter class=solr.HTMLStripCharFilterFactory/
        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory

command=full-import not working, indexes 11 documents

2011-04-18 Thread bryan rasmussen
Hi,

I am using a DataImportHandler to get files from the file system, if I
do the url
http://localhost:8983/solr/tester/dataimport?command=full-import it
ends up indexing 11 documents.
If I do
http://localhost:8983/solr/tester/dataimport?command=full-importrows=817
(the number of documents I have) they all get indexed.

Any explanation for something I might have overlooked in configuration
that would be having this effect?

Thanks,
Bryan Rasmussen


newbie - filter to only show queried field when query is free text

2011-04-15 Thread bryan rasmussen
Hi,

If I want to filter a search result to not return all fields as per
the default but I don't know what field my hits will be in.

This is basically for unstructured document type data, for example
large HTML or DOCBOOK documents.

thanks,
Bryan Rasmussen


DataImportHandler - importing XML documents, undeclared general entity - DTD right there

2011-04-15 Thread bryan rasmussen
Hi,
I am importing a number of XML documents from the filesystem. The
dataimporthandler finds them, but returns an undeclared general entity
error - even though my DTD is present and findable by other parsers.

DTD Declaration
!DOCTYPE ARTIKEL PUBLIC -//Thomson Information AS//DTD ARTIKEL//DK
allartikel.dtd
In XML file in the same folder as the DTD allartikel.dtd

Thanks,
Bryan Rasmussen