Am 27.07.2012 08:20, Dennis E. Hamilton wrote:
There are several matters here.

First, the rules for OpenFormula apply to literals as argument and to 
references to cells where the value is of type text and the expected type is 
Number.

The current rules for implicit conversions used to be clear and consistent in OOo 2. The current compromise which converts integers and ISO dates is still reasonable. Excel gives different results depending on the language version.


It does not apply to how *entered* values are converted.  In fact, there is no 
guidance at all about how *entered* values are converted.  Likewise, how cell 
values are formatted for the spreadsheet's presentation, for export in CSV, 
etc., are independent of these rules in OpenFormula.

It strikes me that entry of values through an user interface is the time to 
deal with these issues and that may resolve most cases.

For use of literals in formulas, there are ways for users to ensure that the 
desired interpretation is made by explicit conversions.  The same conditions 
apply when a cell value is text and it is intended to be used as a number.  
Safe practice could be encouraged even if a given product provides the intended 
result, since there is no assurance of consistent treatment in an interchange 
situation.

(Multi-cell operandsare more complicated, when some cell entries in a range 
have values of different type.)

Without too much thought, I image three things that can be done, along with 
user guidance:

  1. The rules for conversion of entered values based on the specified cell 
format need to be clear, and that might also include understanding the cell 
display formatting in properly handling the cell input conversion.  Since the 
storage of Number types is canonical and not subject to locale considerations, 
this can isolate much of the problem.  The same applies to export into 
non-OpenFormula forms, such as CSV.


The rules for German and English locale are unclear ("intuitive") but easy, reasonable and flexible enough to get adjusted to.

Things become weird in multi-lingual sheets because the input rules differ from cell to cell. IMHO, the interpretation of input strings should always follow the globally set locale regardless of the current cell formatting. Formula literals behave that way, constant cells do not.


  2. Users need to be cautioned about what happens when cells and literals are 
of different types than the expected values.  It might also be useful for there 
to be warnings when text for Number expected arises.  There can also be user 
guidance on how to detect and prevent pathological cases.


Users want this or that instantly without thinking nor reading. Next time they want something completely different. Most serious issue is that new spreadsheet users don't understand the difference between (literal) string and (formatted) number which are the only 2 types in Calc. This accounts for a huge part of all support questions since decades. There is no way to do it right other than insisting on clear technical distinction. The documentation is unclear. There must not be any use of literal strings nor mention of "text formatting". The excelish user interface is extremely misleading because a switch that turns off input interpretation without applying any number format at all is hidden in the number format settings. Number format "@" is not a number format at all nor does it convert anything between number and text (which is what even advanced users believe).

I recommend making the default behavior consistent with widespread practice, 
including consideration of the importance of Excel interoperability.  If there 
seems to be no simple way out of it, I think Error values are the appropriate 
response, so that an user is given the opportunity to provide an explicit 
resolution rather than contend with silent discrepancies in their own use or 
when their spreadsheet is interchanged.

As far as I know, Calc's input methods are the same as in Excel.
The LibreOffice folks are about to shoot themselves into the foot. In version 3.6 with German locale I can not enter dates without leaving the num-pad while needing at least 2 more key strokes for recent dates. A date needs to be entered with 2 or 3 numbers and 2 distinct separators. Apart from the extra key strokes, this means that German users can not type dates on the num-pad anymore. The intention is to calm down the users who enter/paste/import sport results such as 12/9 or 12-9. For some reason they refuse to enter/import/paste text even when you explain them how to do it. They want the software to decide what a text is.


  3. There are probably additional functions that could be introduced into 
OpenFormula, even as implementation-specified extensions (in accordance with 
provisions for that), to provide functions that will pass Numbers intact and 
will apply some formatting rule or function when text is found instead.  (The 
conversion of cells of explicit date and time types is already straightforward 
although it might need review.)  A few new functions may be sufficient to 
achieve complete, straightforward control on behalf of users, including forcing 
of an Error result.  The extensions could be agreed among the different 
implementations of OpenFormula, including in Apache OpenOffice, LibreOffice, 
Gnumeric, Excel 2013, Google Docs, etc.  (But first, it should be verified 
whether there are means enough already.)


IMHO, it is a major problem that virtually nobody writes any add-ins.
The SUMIFS COUNTIFS of OOXML are trivial and they offer nothing new. These should be introduced for compatibility.

The current evaluation of string input by keyboard, clipboard or csv is undocumented.
Let me summarize it from a user's point of view:

1) By default, everything is evaluated as number if possible.
The = prefix triggers formula evaluation.
The ' prefix supresses evaluation all together.

2) We enter decimals with point or comma depending on the global locale which may be overridden by the number format locale. 23/ always enters the 23th day of the month which is very much appreciated by num-pad typers, but some people hate it. Please ignore the haters. 23/4 enters this year's 23th of April except for US locale where this has to be entered as 4/23.
Locale specific date separators may be used instead of the slash.
The colon is used as time separator except for the Italian locales which use a point.
ISO dates work with any locale setting.
The words TRUE and FALSE are recognized in the respective locale context as synonyms for 1 and 0. This is very different from Excel where boolean is a third data type apart from number and string.
Excel: =1=TRUE ==> FALSE
Calc: =1=TRUE ==> TRUE
Comment:
Input 1.234 may give different values on the same sheet depending on the individual cell's number format locale whereas the formula constant =1.234 follows the global locale. IMHO, the evaluation should be consistent throughout the office suite depending on the global locale only. Input 1.234 should always yield the same value within the same locale context even if an explicitly set number format locale may display 1,234. Interestingly, I have never read any complaint about booleans beeing different from Excel.

3) The general number format shows decimal numbers without leading nor trailing zeros and a configurable maximum of decimals behind the point/comma. The general number format automagically applies one particular number format when the input string indicates boolean, date, time, date+time, currency or scientific number format.
Comment:
This is wanted behaviour for the general number format. IMHO, this works perfectly well for English and German numerals. Some people expect crude ways of auto-formatting where the number gets interpreted and formatted as entered. This would make the spreadsheet effectively unusable.

4) Some number format categories change the evaluation method:
4a) Fraction: 3/4 yields 0.75 instead of 3rd April.
4b) Percent: A trailing % is added to all keyboard input effectively dividing the entered numeral by 100. 4c) Text (code @), which is no number format at all: It is more like a switch to turn off all evaluation for new input analog to the ' prefix. Newly entered numbers and prefixes are not evaluated. Any existing number or numeric formula result will be shown in the general number format. Even advanced users tend to believe that this number format converts to text and they expect that the existing string will be evaluated when removing the text attribute.

I think, all this is adequate, well established, very well thought, as easy as possible and as complicated as necessary given that each sheet cell is supposed to take any value of any type and sub-type for any locale.

Suggestions:
As already mentioned, the input method should be the same for all office numerals depending on the global locale only.
That is issue https://issues.apache.org/ooo/show_bug.cgi?id=72640

The default date of the general number format should have a 4-digit year to prevent certain input mistakes.

If AOO implements something analog to the latest LibreOffice date evaluation, then it should include a generous evaluation for format category "Date". Keep the old behaviour (day number with slash) when the cell is explicitly formatted as date.

On top of all number format options the number format dialog (which is the same in Calc, Writer and Base) should have a check box instead of number format category "Text". When you turn on this option, all the other options should be disabled (which is exactly what number format "@" actually does). The option should be labeled "New input as literal text" or something like that. Alternatively, it could be an evaluation switch which enables number formatting when evaluation is on. The second most important option on that dialog is the locale option. All number formatting depends on the locale which should appear On top of the options besides the evaluation switch.

CSV export has an option "export as shown" which is what most people want. Usually we want to turn it off in order to get the maximum of decimal digits but this also removes all other formatting attributes. IMHO 0/1 booleans and decimal fractions are perfectly acceptable but dates and times should be exported as ISO strings rather than decimals based on 1899-12-30 00:00 which makes them unusable for any other application.

I try to explain to confused users that csv is a database exchange format in plain text and that Calc is neither database nor text editor. Calc has to evaluate strings in order to do what it is supposed to do. If HSQLDB accepted more than point decimals and ISO dates we could have a very powerful csv editing tool on board. HSQLDB can link database fields to csv which preserves the decimals, dates, delimiters and encoding regardless of the displayed format in Base.

That's it for now.

Greetings,
A.S


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to