This is an automated notification sent by LCG Savannah.
It relates to:
                task #12232, project CDS Invenio

==============================================================================
 LATEST MODIFICATIONS of task #12232:
==============================================================================

Follow-up Comment #5, task #12232 (project cdsware):

(one should still be careful with line breaks, and HTML-formatted data. As a
consequence, .csv format could live beside, but not replace, the "Excel"
format)

==============================================================================
 OVERVIEW of task #12232:
==============================================================================

URL:
  <http://savannah.cern.ch/task/?12232>

                 Summary: CDS Excel output format produces invalid HTML, not
Excel files
                 Project: CDS Invenio
            Submitted by: vengmark
            Submitted on: 2009-10-30 08:13
         Should Start On: 2009-10-29 00:00
   Should be Finished on: 2009-10-29 00:00
                Category: BibFormat
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
        Percent Complete: 0%
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
                  Effort: 0.00

    _______________________________________________________


How to reproduce:
1. Do a search in CDS, e.g.
<http://cdsweb.cern.ch/search?cc=Books+%26+Proceedings&ot=020&ot=970&ln=en&p=020%3A'1->z'&f=&action_search=Search&c=Books+%26+Proceedings&c=&sf=&so=d&rm=&rg=10&sc=0&of=hx>.
2. Set "Output format" to "Excel"
3. Click "Search"
4. Save result.xls to disk

The resulting file is not an Excel file, and should therefore not have the
extension .xls. It is also not an HTML file, first because it's missing the
<html>, <head>, <title>, and <body> tags, and second because it's not valid
according to the W3C validator <http://validator.w3.org/>, even after adding
those tags.

    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2009-10-30 11:07              By: Jerome Caffaro <jcaffaro>
(one should still be careful with line breaks, and HTML-formatted data. As a
consequence, .csv format could live beside, but not replace, the "Excel"
format)

-------------------------------------------------------
Date: 2009-10-30 10:47              By: Victor Engmark <vengmark>
> csv does not let us have commas in column data

Here's how to use commas in column data: '[CSV uses] a " (double quote)
character around fields that contain reserved characters (such as commas or
newlines)' <http://en.wikipedia.org/wiki/Comma-separated_values>. Other
formats use backslash to escape this sort of content.

Test data (save as test.csv and open):
test,"test, with, commas","test
with
newlines",another test

This works in OpenOffice and MS Office, i.e., shows up as a single row with
four columns.

> The name "Excel"

I'm not arguing about the name in the selection list, only the file
extension. Since it doesn't validate to any existing format, it should
ideally be .txt. You could also use .html, since it is almost compatible with
that. Last I checked, Excel had no problem importing tables in HTML files.

-------------------------------------------------------
Date: 2009-10-30 10:34              By: Jerome Caffaro <jcaffaro>
The decision to use HTML tags instead of csv was taken in order to overcome
this big restriction: csv does not let us have commas in column data. Another
delimiter could still be used, but then it should maybe not have a .csv
extension. HTML also has the nice side-effect of enabling the use of
formatted text (colors, face, links).

The name "Excel" was chosen because the main target for this export are
people who don't know what CSV mean. Still, "Excel-compatible", or "Tabular
data" would be more appropriate to avoid confusion and still please our
target audience.

A .csv format could still be added, as well as a .xlsx format.

-------------------------------------------------------
Date: 2009-10-30 10:18              By: Victor Engmark <vengmark>
Even though it can be imported into Excel, it's not at all an XLS file, and
should not be named as such. CSV would be much better, since it would have
179 characters less per line (HTML tr/td), 83 characters less per file (HTML
start/end) of output, and would be supported by just about any spreadsheet
application under the sky.

Regarding the restrictions, I'm not sure valign, border-color, border-style
and border-width are really necessary for nice output.

-------------------------------------------------------
Date: 2009-10-30 08:36              By: Jerome Caffaro <jcaffaro>
"Excel" output is in fact an "Excel-compatible" output, that is an output
that can be transposed to an Excel stylesheet (unless we do an .xslx output,
we cannot produce a real Excel output AFAIK). 

The output has been reported to be incompatible with Apple "Numbers"
application, and must be correctly imported into OpenOffice "Calc"
application.

For a less surprising results for non-Windows users, we could go to a comma,
tab or space delimited output, with all the consequent restrictions.

See task #3493






    _______________________________________________________

Carbon-Copy List:

CC Address                          | Comment
------------------------------------+-----------------------------
2407                                | -COM-
3964                                | -SUB-




==============================================================================

This item URL is:
  <http://savannah.cern.ch/task/?12232>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/

Reply via email to