SolrJ: clusters, labels, docs - search results

2012-05-21 Thread okayndc
Hello,

Was wondering how to access the cluster labels, and docs(ids) via SolrJ?

I have added the following:
   query.seParam(q, userQuery);
   query.setParam(clustering, true);
   query.setParam(qt, /core2/clustering);
   query.setParam(carrot.title, title);

But how to access the labels, docs in the clusters and display in a search
result?

Also, I've seen others specify clustering in this manner...

ModifiableSolrParams params = new ModifiableSolrParams();
params.set(qt, /core2/clustering);
params.set(q, userQuery);
params.set(carrot.title, title);
params.set(clustering, true);


Is this preferred over the other?

Thanks


solr: adding a string on to a field via DIH

2012-05-05 Thread okayndc
Hello,

Is it possible to concatenate a field via DIH?
For example for the id field, in order to make it unique
I want to add 'project' to the beginning of the id field.
So the field would look like 'project1234'
Is this possible?

field column=id name=id /

Thanks


Re: solr: adding a string on to a field via DIH

2012-05-05 Thread okayndc
Thanks guys.  I had taken a quick look at
the Template Transformer and it looks it does
what I need it to dodidn't see the 'hello' part
when reviewing earlier.

On Sat, May 5, 2012 at 11:47 AM, Jack Krupansky j...@basetechnology.comwrote:

 Sounds like you need a Template Transformer: ... it helps to
 concatenate multiple values or add extra characters to field for injection.

 entity name=e transformer=**TemplateTransformer ..
 field column=namedesc template=hello${e.name},${**eparent.surname} /
 ...
 /entity

 See:
 http://wiki.apache.org/solr/**DataImportHandler#**TemplateTransformerhttp://wiki.apache.org/solr/DataImportHandler#TemplateTransformer

 Or did you have something different in mind?

 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Saturday, May 05, 2012 9:12 AM
 To: solr-user@lucene.apache.org
 Subject: solr: adding a string on to a field via DIH


 Hello,

 Is it possible to concatenate a field via DIH?
 For example for the id field, in order to make it unique
 I want to add 'project' to the beginning of the id field.
 So the field would look like 'project1234'
 Is this possible?

 field column=id name=id /

 Thanks



Re: how to present html content in browse

2012-05-04 Thread okayndc
Hello,

I'm having a hard time understanding this, and I had this same question.

When using DIH should the HTML field be stored in the raw HTML string field
or the stripped field?
Also what source field(s) need to be copied and to what destination?

Thanks


On Thu, May 3, 2012 at 10:15 PM, Lance Norskog goks...@gmail.com wrote:

 Make two fields, one with stores the stripped HTML and another that
 stores the parsed HTML. You can use copyField so that you do not
 have to submit the html page twice.

 You would mark the stripped field 'indexed=true stored=false' and the
 full text field the other way around. The full text field should be a
 String type.

 On Thu, May 3, 2012 at 1:04 PM, srini softtec...@gmail.com wrote:
  I am indexing records from database using DIH. The content of my record
 is in
  html format. When I use browse
  I would like to show the content in html format, not in text format. Any
  ideas?
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html
  Sent from the Solr - User mailing list archive at Nabble.com.



 --
 Lance Norskog
 goks...@gmail.com



solr: how to change display name of a facet?

2012-05-03 Thread okayndc
Hello,

Is there a way to change the display name (that contains spaces or special
characters) for a facet without changing the value of the facet field? For
example if my facet field name is 'category', I want to change the display
name of the facet to 'Categories and Stuff'

I've experimented with this:
str name=facet.field{!ex=dt key=Categories and Stuff}category/str

I'm not really sure what 'ex=dt' does but it's obvious that 'key' is the
desired display name? If there are spaces in the 'key' value, the display
name gets cut off.  What am I doing wrong?

Any help is greatly appreciated.


Re: solr: how to change display name of a facet?

2012-05-03 Thread okayndc
Awesome, thanks!

On Thu, May 3, 2012 at 2:32 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Thu, May 3, 2012 at 2:26 PM, okayndc bodymo...@gmail.com wrote:
 [...]
  I've experimented with this:
  str name=facet.field{!ex=dt key=Categories and Stuff}category/str
 
  I'm not really sure what 'ex=dt' does but it's obvious that 'key' is the
  desired display name? If there are spaces in the 'key' value, the display
  name gets cut off.  What am I doing wrong?

 http://wiki.apache.org/solr/LocalParams
 For a non-simple parameter value, enclose it in single quotes

 ex excludes filters tagged with a value.
 See

 http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams

 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10



Re: extracting/indexing HTML via cURL

2012-05-01 Thread okayndc
Thank you Jack.

So, it's not doable/possible to search and highlight keywords within a
field that contains the raw formatted HTML?  and strip out the HTML tags
during analysis...so that a user would get back nothing if they did a
search for (ex. p)?

On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky j...@basetechnology.comwrote:

 I was thinking that you wanted to index the actual text from the HTML
 page, but have the stored field value still have the raw HTML with tags. If
 you just want to store only the raw HTML, a simple string field is
 sufficient, but then you can't easily do a text search on it.

 Or, you can have two fields, one string field for the raw HTML (stored,
 but not indexed) and then do a CopyField to a text field field that has the
 HTMLStripCharFilter to strip the HTML tags and index only the text
 (indexed, but not stored.)

 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Monday, April 30, 2012 5:06 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr: extracting/indexing HTML via cURL

 Great, thank you for the input.  My understanding of HTMLStripCharFilter is
 that it strips HTML tags, which is not what I want ~ is this correct?  I
 want to keep the HTML tags intact.

 On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky j...@basetechnology.com
 **wrote:

  If by extracting HTML content via cURL you mean using SolrCell to parse
 html files, this seems to make sense. The sequence is that regardless of
 the file type, each file extraction parser will strip off all formatting
 and produce a raw text stream. Office, PDF, and HTML files are all treated
 the same in that way. Then, the unformatted text stream is sent through
 the
 field type analyzers to be tokenized into terms that Lucene can index. The
 input string to the field type analyzer is what gets stored for the field,
 but this occurs after the extraction file parser has already removed
 formatting.

 No way for the formatting to be preserved in that case, other than to go
 back to the original input document before extraction parsing.

 If you really do want to preserve full HTML formatted text, you would need
 to define a field whose field type uses the HTMLStripCharFilter and then
 directly add documents that direct the raw HTML to that field.

 There may be some other way to hook into the update processing chain, but
 that may be too much effort compared to the HTML strip filter.

 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Monday, April 30, 2012 10:07 AM
 To: solr-user@lucene.apache.org
 Subject: Solr: extracting/indexing HTML via cURL


 Hello,

 Over the weekend I experimented with extracting HTML content via cURL and
 just
 wondering why the extraction/indexing process does not include the HTML
 tags.
 It seems as though the HTML tags either being ignored or stripped
 somewhere
 in the pipeline.
 If this is the case, is it possible to include the HTML tags, as I would
 like to keep the
 formatted HTML intact?

 Any help is greatly appreciated.





Re: extracting/indexing HTML via cURL

2012-05-01 Thread okayndc
Awesome, I'll give it try.  Thanks Jack!

On Tue, May 1, 2012 at 10:23 AM, Jack Krupansky j...@basetechnology.comwrote:

 Sorry for the confusion. It is doable. If you feed the raw HTML into a
 field that has the HTMLStripCharFilter, the stored value will retain the
 HTML tags, while the indexed text will be stripped of the of the tags
 during analysis and be searchable just like a normal text field. Then,
 search will not see p.


 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Tuesday, May 01, 2012 10:08 AM
 To: solr-user@lucene.apache.org
 Subject: Re: extracting/indexing HTML via cURL


 Thank you Jack.

 So, it's not doable/possible to search and highlight keywords within a
 field that contains the raw formatted HTML?  and strip out the HTML tags
 during analysis...so that a user would get back nothing if they did a
 search for (ex. p)?

 On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  I was thinking that you wanted to index the actual text from the HTML
 page, but have the stored field value still have the raw HTML with tags.
 If
 you just want to store only the raw HTML, a simple string field is
 sufficient, but then you can't easily do a text search on it.

 Or, you can have two fields, one string field for the raw HTML (stored,
 but not indexed) and then do a CopyField to a text field field that has
 the
 HTMLStripCharFilter to strip the HTML tags and index only the text
 (indexed, but not stored.)

 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Monday, April 30, 2012 5:06 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr: extracting/indexing HTML via cURL

 Great, thank you for the input.  My understanding of HTMLStripCharFilter
 is
 that it strips HTML tags, which is not what I want ~ is this correct?  I
 want to keep the HTML tags intact.

 On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky j...@basetechnology.com
 
 **wrote:


  If by extracting HTML content via cURL you mean using SolrCell to parse

 html files, this seems to make sense. The sequence is that regardless of
 the file type, each file extraction parser will strip off all
 formatting
 and produce a raw text stream. Office, PDF, and HTML files are all
 treated
 the same in that way. Then, the unformatted text stream is sent through
 the
 field type analyzers to be tokenized into terms that Lucene can index.
 The
 input string to the field type analyzer is what gets stored for the
 field,
 but this occurs after the extraction file parser has already removed
 formatting.

 No way for the formatting to be preserved in that case, other than to go
 back to the original input document before extraction parsing.

 If you really do want to preserve full HTML formatted text, you would
 need
 to define a field whose field type uses the HTMLStripCharFilter and then
 directly add documents that direct the raw HTML to that field.

 There may be some other way to hook into the update processing chain, but
 that may be too much effort compared to the HTML strip filter.

 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Monday, April 30, 2012 10:07 AM
 To: solr-user@lucene.apache.org
 Subject: Solr: extracting/indexing HTML via cURL


 Hello,

 Over the weekend I experimented with extracting HTML content via cURL and
 just
 wondering why the extraction/indexing process does not include the HTML
 tags.
 It seems as though the HTML tags either being ignored or stripped
 somewhere
 in the pipeline.
 If this is the case, is it possible to include the HTML tags, as I would
 like to keep the
 formatted HTML intact?

 Any help is greatly appreciated.







Solr: extracting/indexing HTML via cURL

2012-04-30 Thread okayndc
Hello,

Over the weekend I experimented with extracting HTML content via cURL and
just
wondering why the extraction/indexing process does not include the HTML
tags.
It seems as though the HTML tags either being ignored or stripped somewhere
in the pipeline.
If this is the case, is it possible to include the HTML tags, as I would
like to keep the
formatted HTML intact?

Any help is greatly appreciated.


Re: Solr: extracting/indexing HTML via cURL

2012-04-30 Thread okayndc
Great, thank you for the input.  My understanding of HTMLStripCharFilter is
that it strips HTML tags, which is not what I want ~ is this correct?  I
want to keep the HTML tags intact.

On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky j...@basetechnology.comwrote:

 If by extracting HTML content via cURL you mean using SolrCell to parse
 html files, this seems to make sense. The sequence is that regardless of
 the file type, each file extraction parser will strip off all formatting
 and produce a raw text stream. Office, PDF, and HTML files are all treated
 the same in that way. Then, the unformatted text stream is sent through the
 field type analyzers to be tokenized into terms that Lucene can index. The
 input string to the field type analyzer is what gets stored for the field,
 but this occurs after the extraction file parser has already removed
 formatting.

 No way for the formatting to be preserved in that case, other than to go
 back to the original input document before extraction parsing.

 If you really do want to preserve full HTML formatted text, you would need
 to define a field whose field type uses the HTMLStripCharFilter and then
 directly add documents that direct the raw HTML to that field.

 There may be some other way to hook into the update processing chain, but
 that may be too much effort compared to the HTML strip filter.

 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Monday, April 30, 2012 10:07 AM
 To: solr-user@lucene.apache.org
 Subject: Solr: extracting/indexing HTML via cURL


 Hello,

 Over the weekend I experimented with extracting HTML content via cURL and
 just
 wondering why the extraction/indexing process does not include the HTML
 tags.
 It seems as though the HTML tags either being ignored or stripped somewhere
 in the pipeline.
 If this is the case, is it possible to include the HTML tags, as I would
 like to keep the
 formatted HTML intact?

 Any help is greatly appreciated.



escaping HTML tags within XML file

2011-09-25 Thread okayndc
Hello,

Was wondering if it is necessary to escape HTML tags within an XML file for
indexing?  If so, seems like a large XML files with tons of HTML tags could
get really messy (using CDATA).
Has this been your experience?  Do you escape the HTML tags? If so, what
technique do you use? Or do you leave the HTML tags in place without
escaping them?

Thanks!


Re: escaping HTML tags within XML file

2011-09-25 Thread okayndc
Here is a representation of the XML file...

root
commenter
commentpText here/pimg src=image.gif /pMore text
here/p/comment
/commenter
/root

I want to keep the HTML tags because it keeps the formatting (paragraph
tags, etc) intact for the output.  Seems like you're saying that the HTML
can be kept intact with the use of a HTML field type without having to
escape the HTML tags?

On Sun, Sep 25, 2011 at 2:52 PM, pulkitsing...@gmail.com wrote:

 Assuming that the XML has the HTML as values inside fully formed tags like
 so:
 nodeHTML/HTML/node then I think that using the HTML field type in
 schema.xml for indexing/storing will allow you to do meaningful searches on
 the content of the HTML without getting confused by the HTML syntax itself.

 If you have absolutely no need for the entire stored HTML when presenting
 results to the user then stripping out the syntax at index time makes sense.
 This will adversely affect highlighting of  that document field as well so
 just know your requirements.

 If you don't want to present anything at all then don't store, just index
 and use the right field type (HTML) such that search results find the right
 document. Just because a field is helpful in finding the doc, doesn't mean
 folks always want to present it or store it.

 With Data Import Handler a HTML stripping transformer is present so that it
 is removed before the indexer gets it's hands on things. I can't be sure if
 that is how you get your data into Solr.

 - Pulkit

 Sent from my iPhone

 On Sep 25, 2011, at 8:00 AM, okayndc bodymo...@gmail.com wrote:

  Hello,
 
  Was wondering if it is necessary to escape HTML tags within an XML file
 for
  indexing?  If so, seems like a large XML files with tons of HTML tags
 could
  get really messy (using CDATA).
  Has this been your experience?  Do you escape the HTML tags? If so, what
  technique do you use? Or do you leave the HTML tags in place without
  escaping them?
 
  Thanks!



running SOLR on same server as your website

2011-09-07 Thread okayndc
Hi everyone!

Is it not a good practice to run SOLR on the same server where you website
files sit?  Or is it a MUST to house SOLR on it's own application server?
The problem that I'm facing is that, my website's files sit on a servlet
container (Tomcat) and I think it would be more convenient to house the SOLR
instance on the same server?  Is this not a good idea?  What is your SOLR
setup?

Thanks


Re: running SOLR on same server as your website

2011-09-07 Thread okayndc
In the context of application, I assume that you mean SOLRJ (for example)?

On Wed, Sep 7, 2011 at 10:04 AM, Erik Hatcher erik.hatc...@gmail.comwrote:

 It's not necessarily a bad idea... as long as you secure it properly such
 that user requests cannot hit Solr, only requests from your application can
 do so.

 Eventually, perhaps, scale would be an issue and you'd want/need to
 separate the tiers, but as long as you've got security and scalability
 covered there's no reason not to deploy together like that.

Erik

 On Sep 7, 2011, at 10:01 , okayndc wrote:

  Hi everyone!
 
  Is it not a good practice to run SOLR on the same server where you
 website
  files sit?  Or is it a MUST to house SOLR on it's own application server?
  The problem that I'm facing is that, my website's files sit on a servlet
  container (Tomcat) and I think it would be more convenient to house the
 SOLR
  instance on the same server?  Is this not a good idea?  What is your SOLR
  setup?
 
  Thanks




Re: running SOLR on same server as your website

2011-09-07 Thread okayndc
Right now, the index is relatively small in size ~less than 1mb.  I think
right now, it's okay but, a couple years down the road, we may have to
transfer SOLR onto a separate application server.

On Wed, Sep 7, 2011 at 10:15 AM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote:

 You could host Solr inside the same Tomcat container, or in a different
 servlet container (say, a second Tomcat instance) on the same server.

 Be aware of your OS memory requirements, though:  In my experience, Solr
 performs best when it has lots of OS memory to cache index files (at least,
 if your index is very big).  For that reason alone, we chose to host our
 Solr instance (used internally only) in a separate virtual machine in its
 own web app server instance.

 It is all a matter of managing your memory, CPU and disk performance.  If
 those are already constrained or nearly constrained on your website, then
 adding Solr into that mix is probably not such a good idea.  If those are
 not issues on your existing website, and your Solr load is modest, then you
 can probably squeeze it onto the same server.

 Like most real-world answers, it comes down to it depends.

 JRJ

 -Original Message-
 From: okayndc [mailto:bodymo...@gmail.com]
 Sent: Wednesday, September 07, 2011 9:02 AM
 To: solr-user@lucene.apache.org
 Subject: running SOLR on same server as your website

 Hi everyone!

 Is it not a good practice to run SOLR on the same server where you website
 files sit?  Or is it a MUST to house SOLR on it's own application server?
 The problem that I'm facing is that, my website's files sit on a servlet
 container (Tomcat) and I think it would be more convenient to house the
 SOLR
 instance on the same server?  Is this not a good idea?  What is your SOLR
 setup?

 Thanks



Re: solr/velocity: funtion for sorting asc/desc

2011-07-13 Thread okayndc
Thanks Eric.

So if I had a link Sort Title and the default is sort=title desc how can
i switch that to sort=title asc?

example:  http://# Sort Tile  (default sort=title desc)  user clicks on
link and sort should toggle(or switch) to sort=title asc

how can this be achieved?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-velocity-funtion-for-sorting-asc-desc-tp3163549p3167267.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr/velocity: funtion for sorting asc/desc

2011-07-13 Thread okayndc
Awesome, thanks Erik!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-velocity-funtion-for-sorting-asc-desc-tp3163549p3167662.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr/velocity: funtion for sorting asc/desc

2011-07-12 Thread okayndc
hello,

was wondering if there is a solr/velocity function out there that can sort
say, a title name, by clicking on a link named sort title that can sort
ascending or descending by alpha?  or is this a frontend/jquery type of
thing?

thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-velocity-funtion-for-sorting-asc-desc-tp3163549p3163549.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Exact phrase highlighting

2011-06-27 Thread okayndc
has this bug been fixed?  i'm using solr 3.1 and it still seems to be an
issue.  if i do a search for bird house i still get embird/em
emhouse/em returned instead of embird house/em, which is the desired
result.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-phrase-highlighting-tp480339p3113824.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: velocity: hyperlinking to documents

2011-06-23 Thread okayndc
Yes, from the handy /browse view.

 I'll give this a try. Thanks Erik! 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/velocity-hyperlinking-to-documents-tp3091504p3100957.html
Sent from the Solr - User mailing list archive at Nabble.com.


velocity: hyperlinking to documents

2011-06-21 Thread okayndc
hello,

i'm not sure of the correct velocity syntax to link, let's say a title
field, to the actual document itself. i have a hostname, a category (which
is also the directory where the file sits) and filename fields in my schema. 
can i potentially use these fields to get at the document itself?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/velocity-hyperlinking-to-documents-tp3091504p3091504.html
Sent from the Solr - User mailing list archive at Nabble.com.